Crispr-cas systems, crystal structure and uses thereof

ABSTRACT

The invention provides for systems, methods, and compositions for altering expression of target gene sequences and related gene products. Provided are structural information on the Cas protein of the CRISPR-Cas system, use of this information in generating modified components of the CRISPR complex, vectors and vector systems which encode one or more components or modified components of a CRISPR complex, as well as methods for the design and use of such vectors and components. Also provided are methods of directing CRISPR complex formation in eukaryotic cells and methods for utilizing the CRISPR-Cas system.

RELATED APPLICATIONS AND INCORPORATION BY REFERENCE

This application is a Continuation of U.S. application Ser. No.15/171,141 filed Jul. 2, 2016, which is a Continuation-In-Part ofInternational Application Number PCT/US2014/069925 filed Dec. 12, 2014which published as PCT Publication Number WO2015/089364 on Jun. 18,2015. This application claims priority from U.S. provisional patentapplications: 61/915,251, filed Dec. 12, 2013; and 61/930,214 filed Jan.22, 2014.

The foregoing applications, and all documents cited therein or duringtheir prosecution (“appln cited documents”) and all documents cited orreferenced in the appln cited documents, and all documents cited orreferenced herein (“herein cited documents”), and all documents cited orreferenced in herein cited documents, together with any manufacturer'sinstructions, descriptions, product specifications, and product sheetsfor any products mentioned herein or in any document incorporated byreference herein, are hereby incorporated herein by reference, and maybe employed in the practice of the invention. More specifically, allreferenced documents are incorporated by reference to the same extent asif each individual document was specifically and individually indicatedto be incorporated by reference.

STATEMENT AS TO FEDERALLY SPONSORED RESEARCH

This invention was made with government support under Grant No.MH100706, awarded by the National Institutes of Health. The governmenthas certain rights in the invention.

STATEMENT REGARDING SEQUENCE LISTING

The sequence listing associated with this application is provided intext format in lieu of a paper copy, and is hereby incorporated byreference into the specification. The name of the text file containingthe sequence listing is 114203-5779_Sequence.txt. The text file is 323kb, was created on Jul. 19, 2019, and is being submitted electronicallyvia EFS-Web.

FIELD OF THE INVENTION

The present invention generally relates to systems, methods andcompositions used for the control of gene expression involving sequencetargeting, such as genome perturbation or gene-editing, that may usevector systems related to Clustered Regularly Interspaced ShortPalindromic Repeats (CRISPR) and components thereof.

This invention was made with government support under PRESTO (PrecursoryResearch for Embryonic Science and Technology, Sakigake) in the field of“Structural life science and advanced core technologies for innovativelife science research”, awarded by JST (Japan Science and TechnologyAgency) in 2012. JST has certain rights in the invention.

This invention was made with government support under CREST in the fieldof “Creation of Basic Medical Technologies to Clarify and Control theMechanisms Underlying Chronic Inflammation”, awarded by JST (JapanScience and Technology Agency) in 2013. JST has certain rights in theinvention.

BACKGROUND OF THE INVENTION

Recent advances in genome sequencing techniques and analysis methodshave significantly accelerated the ability to catalog and map geneticfactors associated with a diverse range of biological functions anddiseases. Precise genome targeting technologies are needed to enablesystematic reverse engineering of causal genetic variations by allowingselective perturbation of individual genetic elements, as well as toadvance synthetic biology, biotechnological, and medical applications.Although genome-editing techniques such as designer zinc fingers,transcription activator-like effectors (TALEs), or homing meganucleasesare available for producing targeted genome perturbations, there remainsa need for new genome engineering technologies that are affordable, easyto set up, scalable, and amenable to targeting multiple positions withinthe eukaryotic genome.

SUMMARY OF THE INVENTION

There exists a pressing need for alternative and robust systems andtechniques for sequence targeting with a wide array of applications.This invention addresses this need and provides related advantages. TheCRISPR/Cas or the CRISPR-Cas system (both terms are used interchangeablythroughout this application) does not require the generation ofcustomized proteins to target specific sequences but rather a single Casenzyme can be programmed by a short RNA molecule to recognize a specificDNA target, in other words the Cas enzyme can be recruited to a specificDNA target using said short RNA molecule. Adding the CRISPR-Cas systemto the repertoire of genome sequencing techniques and analysis methodsmay significantly simplify the methodology and accelerate the ability tocatalog and map genetic factors associated with a diverse range ofbiological functions and diseases. To utilize the CRISPR-Cas systemeffectively for genome editing without deleterious effects, it iscritical to understand aspects of engineering and optimization of thesegenome engineering tools, which are aspects of the claimed invention.

In one aspect, the invention provides a method for altering or modifyingexpression of a gene product. The said method may comprise introducinginto a cell containing and expressing a DNA molecule encoding the geneproduct an engineered, non-naturally occurring CRISPR-Cas systemcomprising a Cas protein and guide RNA that targets the DNA molecule,whereby the guide RNA targets the DNA molecule encoding the gene productand the Cas protein cleaves the DNA molecule encoding the gene product,whereby expression of the gene product is altered; and, wherein the Casprotein and the guide RNA do not naturally occur together. The inventioncomprehends the guide RNA comprising a guide sequence fused to a tracrsequence. The invention further comprehends the Cas protein being codonoptimized for expression in a Eukaryotic cell. In a preferred embodimentthe Eukaryotic cell is a mammalian cell and in a more preferredembodiment the mammalian cell is a human cell. In a further embodimentof the invention, the expression of the gene product is decreased.

In one aspect, the invention provides an engineered, non-naturallyoccurring CRISPR-Cas system comprising a Cas protein and a guide RNAthat targets a DNA molecule encoding a gene product in a cell, wherebythe guide RNA targets the DNA molecule encoding the gene product and theCas protein cleaves the DNA molecule encoding the gene product, wherebyexpression of the gene product is altered; and, wherein the Cas proteinand the guide RNA do not naturally occur together. The inventioncomprehends the guide RNA comprising a guide sequence fused to a tracrsequence. In an embodiment of the invention the Cas protein is a type IICRISPR-Cas protein and in a preferred embodiment the Cas protein is aCas9 protein. The invention further comprehends the Cas protein beingcodon optimized for expression in a Eukaryotic cell. In a preferredembodiment the Eukaryotic cell is a mammalian cell and in a morepreferred embodiment the mammalian cell is a human cell. In a furtherembodiment of the invention, the expression of the gene product isdecreased.

In another aspect, the invention provides an engineered, non-naturallyoccurring vector system comprising one or more vectors comprising afirst regulatory element operably linked to a CRISPR-Cas system guideRNA that targets a DNA molecule encoding a gene product and a secondregulatory element operably linked to a Cas protein. Components (a) and(b) may be located on same or different vectors of the system. The guideRNA targets the DNA molecule encoding the gene product in a cell and theCas protein cleaves the DNA molecule encoding the gene product, wherebyexpression of the gene product is altered; and, wherein the Cas proteinand the guide RNA do not naturally occur together. The inventioncomprehends the guide RNA comprising a guide sequence fused to a tracrsequence. In an embodiment of the invention the Cas protein is a type IICRISPR-Cas protein and in a preferred embodiment the Cas protein is aCas9 protein. The invention further comprehends the Cas protein beingcodon optimized for expression in a Eukaryotic cell. In a preferredembodiment the Eukaryotic cell is a mammalian cell and in a morepreferred embodiment the mammalian cell is a human cell. In a furtherembodiment of the invention, the expression of the gene product isdecreased.

In one aspect, the invention provides a vector system comprising one ormore vectors. In some embodiments, the system comprises: (a) a firstregulatory element operably linked to a tracr mate sequence and one ormore insertion sites for inserting one or more guide sequences upstreamof the tracr mate sequence, wherein when expressed, the guide sequencedirects sequence-specific binding of a CRISPR complex to a targetsequence in a eukaryotic cell, wherein the CRISPR complex comprises aCRISPR enzyme complexed with (1) the guide sequence that is hybridizedto the target sequence, and (2) the tracr mate sequence that ishybridized to the tracr sequence; and (b) a second regulatory elementoperably linked to an enzyme-coding sequence encoding said CRISPR enzymecomprising a nuclear localization sequence; wherein components (a) and(b) are located on the same or different vectors of the system. In someembodiments, component (a) further comprises the tracr sequencedownstream of the tracr mate sequence under the control of the firstregulatory element. In some embodiments, component (a) further comprisestwo or more guide sequences operably linked to the first regulatoryelement, wherein when expressed, each of the two or more guide sequencesdirect sequence specific binding of a CRISPR complex to a differenttarget sequence in a eukaryotic cell. In some embodiments, the systemcomprises the tracr sequence under the control of a third regulatoryelement, such as a polymerase III promoter. In some embodiments, thetracr sequence exhibits at least 50%, 60%, 70%, 80%, 90%, 95%, or 99% ofsequence complementarity along the length of the tracr mate sequencewhen optimally aligned. Determining optimal alignment is within thepurview of one of skill in the art. For example, there are publicallyand commercially available alignment algorithms and programs such as,but not limited to, ClustalW, Smith-Waterman in matlab, Bowtie,Geneious, Biopython and SeqMan. In some embodiments, the CRISPR complexcomprises one or more nuclear localization sequences of sufficientstrength to drive accumulation of said CRISPR complex in a detectableamount in the nucleus of a eukaryotic cell. Without wishing to be boundby theory, it is believed that a nuclear localization sequence is notnecessary for CRISPR complex activity in eukaryotes, but that includingsuch sequences enhances activity of the system, especially as totargeting nucleic acid molecules in the nucleus. In some embodiments,the CRISPR enzyme is a type II CRISPR system enzyme. In someembodiments, the CRISPR enzyme is a Cas9 enzyme. In some embodiments,the Cas9 enzyme is S. pneumoniae, S. pyogenes, or S. thermophilus Cas9,and may include mutated Cas9 derived from these organisms. The enzymemay be a Cas9 homolog or ortholog. In some embodiments, the CRISPRenzyme is codon-optimized for expression in a eukaryotic cell. In someembodiments, the CRISPR enzyme directs cleavage of one or two strands atthe location of the target sequence. In some embodiments, the firstregulatory element is a polymerase III promoter. In some embodiments,the second regulatory element is a polymerase II promoter. In someembodiments, the guide sequence is at least 15, 16, 17, 18, 19, 20, 25nucleotides, or between 10-30, or between 15-25, or between 15-20nucleotides in length. In general, and throughout this specification,the term “vector” refers to a nucleic acid molecule capable oftransporting another nucleic acid to which it has been linked. Vectorsinclude, but are not limited to, nucleic acid molecules that aresingle-stranded, double-stranded, or partially double-stranded; nucleicacid molecules that comprise one or more free ends, no free ends (e.g.circular); nucleic acid molecules that comprise DNA, RNA, or both; andother varieties of polynucleotides known in the art. One type of vectoris a “plasmid,” which refers to a circular double stranded DNA loop intowhich additional DNA segments can be inserted, such as by standardmolecular cloning techniques. Another type of vector is a viral vector,wherein virally-derived DNA or RNA sequences are present in the vectorfor packaging into a virus (e.g. retroviruses, replication defectiveretroviruses, adenoviruses, replication defective adenoviruses, andadeno-associated viruses). Viral vectors also include polynucleotidescarried by a virus for transfection into a host cell. Certain vectorsare capable of autonomous replication in a host cell into which they areintroduced (e.g. bacterial vectors having a bacterial origin ofreplication and episomal mammalian vectors). Other vectors (e.g.,non-episomal mammalian vectors) are integrated into the genome of a hostcell upon introduction into the host cell, and thereby are replicatedalong with the host genome. Moreover, certain vectors are capable ofdirecting the expression of genes to which they are operatively-linked.Such vectors are referred to herein as “expression vectors.” Commonexpression vectors of utility in recombinant DNA techniques are often inthe form of plasmids.

Recombinant expression vectors can comprise a nucleic acid of theinvention in a form suitable for expression of the nucleic acid in ahost cell, which means that the recombinant expression vectors includeone or more regulatory elements, which may be selected on the basis ofthe host cells to be used for expression, that is operatively-linked tothe nucleic acid sequence to be expressed. Within a recombinantexpression vector, “operably linked” is intended to mean that thenucleotide sequence of interest is linked to the regulatory element(s)in a manner that allows for expression of the nucleotide sequence (e.g.in an in vitro transcription/translation system or in a host cell whenthe vector is introduced into the host cell).

The term “regulatory element” is intended to include promoters,enhancers, internal ribosomal entry sites (IRES), and other expressioncontrol elements (e.g. transcription termination signals, such aspolyadenylation signals and poly-U sequences). Such regulatory elementsare described, for example, in Goeddel, GENE EXPRESSION TECHNOLOGY:METHODS IN ENZYMOLOGY 185, Academic Press, San Diego, Calif. (1990).Regulatory elements include those that direct constitutive expression ofa nucleotide sequence in many types of host cell and those that directexpression of the nucleotide sequence only in certain host cells (e.g.,tissue-specific regulatory sequences). A tissue-specific promoter maydirect expression primarily in a desired tissue of interest, such asmuscle, neuron, bone, skin, blood, specific organs (e.g. liver,pancreas), or particular cell types (e.g. lymphocytes). Regulatoryelements may also direct expression in a temporal-dependent manner, suchas in a cell-cycle dependent or developmental stage-dependent manner,which may or may not also be tissue or cell-type specific. In someembodiments, a vector comprises one or more pol III promoter (e.g. 1, 2,3, 4, 5, or more pol I promoters), one or more pol II promoters (e.g. 1,2, 3, 4, 5, or more pol II promoters), one or more pol I promoters (e.g.1, 2, 3, 4, 5, or more pol I promoters), or combinations thereof.Examples of pol III promoters include, but are not limited to, U6 and H1promoters. Examples of pol II promoters include, but are not limited to,the retroviral Rous sarcoma virus (RSV) LTR promoter (optionally withthe RSV enhancer), the cytomegalovirus (CMV) promoter (optionally withthe CMV enhancer) [see, e.g., Boshart et al, Cell, 41:521-530 (1985)],the SV40 promoter, the dihydrofolate reductase promoter, the β-actinpromoter, the phosphoglycerol kinase (PGK) promoter, and the EF1αpromoter. Also encompassed by the term “regulatory element” are enhancerelements, such as WPRE; CMV enhancers; the R-U5′ segment in LTR ofHTLV-I (Mol. Cell. Biol., Vol. 8(1), p. 466-472, 1988); SV40 enhancer;and the intron sequence between exons 2 and 3 of rabbit β-globin (Proc.Natl. Acad. Sci. USA., Vol. 78(3), p. 1527-31, 1981). It will beappreciated by those skilled in the art that the design of theexpression vector can depend on such factors as the choice of the hostcell to be transformed, the level of expression desired, etc. A vectorcan be introduced into host cells to thereby produce transcripts,proteins, or peptides, including fusion proteins or peptides, encoded bynucleic acids as described herein (e.g., clustered regularlyinterspersed short palindromic repeats (CRISPR) transcripts, proteins,enzymes, mutant forms thereof, fusion proteins thereof, etc.).

Advantageous vectors include lentiviruses and adeno-associated viruses,and types of such vectors can also be selected for targeting particulartypes of cells.

In one aspect, the invention provides a eukaryotic host cell comprising(a) a first regulatory element operably linked to a tracr mate sequenceand one or more insertion sites for inserting one or more guidesequences upstream of the tracr mate sequence, wherein when expressed,the guide sequence directs sequence-specific binding of a CRISPR complexto a target sequence in a eukaryotic cell, wherein the CRISPR complexcomprises a CRISPR enzyme complexed with (1) the guide sequence that ishybridized to the target sequence, and (2) the tracr mate sequence thatis hybridized to the tracr sequence; and/or (b) a second regulatoryelement operably linked to an enzyme-coding sequence encoding saidCRISPR enzyme comprising a nuclear localization sequence. In someembodiments, the host cell comprises components (a) and (b). In someembodiments, component (a), component (b), or components (a) and (b) arestably integrated into a genome of the host eukaryotic cell. In someembodiments, component (a) further comprises the tracr sequencedownstream of the tracr mate sequence under the control of the firstregulatory element. In some embodiments, component (a) further comprisestwo or more guide sequences operably linked to the first regulatoryelement, wherein when expressed, each of the two or more guide sequencesdirect sequence specific binding of a CRISPR complex to a differenttarget sequence in a eukaryotic cell. In some embodiments, theeukaryotic host cell further comprises a third regulatory element, suchas a polymerase III promoter, operably linked to said tracr sequence. Insome embodiments, the tracr sequence exhibits at least 50%, 60%, 70%,80%, 90%, 95%, or 99% of sequence complementarity along the length ofthe tracr mate sequence when optimally aligned. The enzyme may be a Cas9homolog or ortholog. In some embodiments, the CRISPR enzyme iscodon-optimized for expression in a eukaryotic cell. In someembodiments, the CRISPR enzyme directs cleavage of one or two strands atthe location of the target sequence. In some embodiments, the CRISPRenzyme lacks DNA strand cleavage activity. In some embodiments, thefirst regulatory element is a polymerase III promoter. In someembodiments, the second regulatory element is a polymerase II promoter.In some embodiments, the guide sequence is at least 15, 16, 17, 18, 19,20, 25 nucleotides, or between 10-30, or between 15-25, or between 15-20nucleotides in length. In an aspect, the invention provides a non-humaneukaryotic organism; preferably a multicellular eukaryotic organism,comprising a eukaryotic host cell according to any of the describedembodiments. In other aspects, the invention provides a eukaryoticorganism; preferably a multicellular eukaryotic organism, comprising aeukaryotic host cell according to any of the described embodiments. Theorganism in some embodiments of these aspects may be an animal; forexample a mammal. Also, the organism may be an arthropod such as aninsect. The organism also may be a plant. Further, the organism may be afungus.

In one aspect, the invention provides a kit comprising one or more ofthe components described herein. In some embodiments, the kit comprisesa vector system and instructions for using the kit. In some embodiments,the vector system comprises (a) a first regulatory element operablylinked to a tracr mate sequence and one or more insertion sites forinserting one or more guide sequences upstream of the tracr matesequence, wherein when expressed, the guide sequence directssequence-specific binding of a CRISPR complex to a target sequence in aeukaryotic cell, wherein the CRISPR complex comprises a CRISPR enzymecomplexed with (1) the guide sequence that is hybridized to the targetsequence, and (2) the tracr mate sequence that is hybridized to thetracr sequence; and/or (b) a second regulatory element operably linkedto an enzyme-coding sequence encoding said CRISPR enzyme comprising anuclear localization sequence. In some embodiments, the kit comprisescomponents (a) and (b) located on the same or different vectors of thesystem. In some embodiments, component (a) further comprises the tracrsequence downstream of the tracr mate sequence under the control of thefirst regulatory element. In some embodiments, component (a) furthercomprises two or more guide sequences operably linked to the firstregulatory element, wherein when expressed, each of the two or moreguide sequences direct sequence specific binding of a CRISPR complex toa different target sequence in a eukaryotic cell. In some embodiments,the system further comprises a third regulatory element, such as apolymerase III promoter, operably linked to said tracr sequence. In someembodiments, the tracr sequence exhibits at least 50%, 60%, 70%, 80%,90%, 95%, or 99% of sequence complementarity along the length of thetracr mate sequence when optimally aligned. In some embodiments, theCRISPR enzyme comprises one or more nuclear localization sequences ofsufficient strength to drive accumulation of said CRISPR enzyme in adetectable amount in the nucleus of a eukaryotic cell. In someembodiments, the CRISPR enzyme is a type II CRISPR system enzyme. Insome embodiments, the CRISPR enzyme is a Cas9 enzyme. In someembodiments, the Cas9 enzyme is S. pneumoniae, S. pyogenes or S.thermophilus Cas9, and may include mutated Cas9 derived from theseorganisms. The enzyme may be a Cas9 homolog or ortholog. In someembodiments, the CRISPR enzyme is codon-optimized for expression in aeukaryotic cell. In some embodiments, the CRISPR enzyme directs cleavageof one or two strands at the location of the target sequence. In someembodiments, the CRISPR enzyme lacks DNA strand cleavage activity. Insome embodiments, the first regulatory element is a polymerase IIIpromoter. In some embodiments, the second regulatory element is apolymerase II promoter. In some embodiments, the guide sequence is atleast 15, 16, 17, 18, 19, 20, 25 nucleotides, or between 10-30, orbetween 15-25, or between 15-20 nucleotides in length.

In one aspect, the invention provides a method of modifying a targetpolynucleotide in a eukaryotic cell. In some embodiments, the methodcomprises allowing a CRISPR complex to bind to the target polynucleotideto effect cleavage of said target polynucleotide thereby modifying thetarget polynucleotide, wherein the CRISPR complex comprises a CRISPRenzyme complexed with a guide sequence hybridized to a target sequencewithin said target polynucleotide, wherein said guide sequence is linkedto a tracr mate sequence which in turn hybridizes to a tracr sequence.In some embodiments, said cleavage comprises cleaving one or two strandsat the location of the target sequence by said CRISPR enzyme. In someembodiments, said cleavage results in decreased transcription of atarget gene. In some embodiments, the method further comprises repairingsaid cleaved target polynucleotide by homologous recombination with anexogenous template polynucleotide, wherein said repair results in amutation comprising an insertion, deletion, or substitution of one ormore nucleotides of said target polynucleotide. In some embodiments,said mutation results in one or more amino acid changes in a proteinexpressed from a gene comprising the target sequence. In someembodiments, the method further comprises delivering one or more vectorsto said eukaryotic cell, wherein the one or more vectors driveexpression of one or more of: the CRISPR enzyme, the guide sequencelinked to the tracr mate sequence, and the tracr sequence. In someembodiments, said vectors are delivered to the eukaryotic cell in asubject. In some embodiments, said modifying takes place in saideukaryotic cell in a cell culture. In some embodiments, the methodfurther comprises isolating said eukaryotic cell from a subject prior tosaid modifying. In some embodiments, the method further comprisesreturning said eukaryotic cell and/or cells derived therefrom to saidsubject.

In one aspect, the invention provides a method of modifying expressionof a polynucleotide in a eukaryotic cell. In some embodiments, themethod comprises allowing a CRISPR complex to bind to the polynucleotidesuch that said binding results in increased or decreased expression ofsaid polynucleotide; wherein the CRISPR complex comprises a CRISPRenzyme complexed with a guide sequence hybridized to a target sequencewithin said polynucleotide, wherein said guide sequence is linked to atracr mate sequence which in turn hybridizes to a tracr sequence. Insome embodiments, the method further comprises delivering one or morevectors to said eukaryotic cells, wherein the one or more vectors driveexpression of one or more of: the CRISPR enzyme, the guide sequencelinked to the tracr mate sequence, and the tracr sequence.

In one aspect, the invention provides a method of generating a modeleukaryotic cell comprising a mutated disease gene. In some embodiments,a disease gene is any gene associated an increase in the risk of havingor developing a disease. In some embodiments, the method comprises (a)introducing one or more vectors into a eukaryotic cell, wherein the oneor more vectors drive expression of one or more of: a CRISPR enzyme, aguide sequence linked to a tracr mate sequence, and a tracr sequence;and (b) allowing a CRISPR complex to bind to a target polynucleotide toeffect cleavage of the target polynucleotide within said disease gene,wherein the CRISPR complex comprises the CRISPR enzyme complexed with(1) the guide sequence that is hybridized to the target sequence withinthe target polynucleotide, and (2) the tracr mate sequence that ishybridized to the tracr sequence, thereby generating a model eukaryoticcell comprising a mutated disease gene. In some embodiments, saidcleavage comprises cleaving one or two strands at the location of thetarget sequence by said CRISPR enzyme. In some embodiments, saidcleavage results in decreased transcription of a target gene. In someembodiments, the method further comprises repairing said cleaved targetpolynucleotide by homologous recombination with an exogenous templatepolynucleotide, wherein said repair results in a mutation comprising aninsertion, deletion, or substitution of one or more nucleotides of saidtarget polynucleotide. In some embodiments, said mutation results in oneor more amino acid changes in a protein expression from a genecomprising the target sequence.

In one aspect, the invention provides a method for developing abiologically active agent that modulates a cell signaling eventassociated with a disease gene. In some embodiments, a disease gene isany gene associated an increase in the risk of having or developing adisease. In some embodiments, the method comprises (a) contacting a testcompound with a model cell of any one of the described embodiments; and(b) detecting a change in a readout that is indicative of a reduction oran augmentation of a cell signaling event associated with said mutationin said disease gene, thereby developing said biologically active agentthat modulates said cell signaling event associated with said diseasegene.

In one aspect, the invention provides a recombinant polynucleotidecomprising a guide sequence upstream of a tracr mate sequence, whereinthe guide sequence when expressed directs sequence-specific binding of aCRISPR complex to a corresponding target sequence present in aeukaryotic cell. In some embodiments, the target sequence is a viralsequence present in a eukaryotic cell. In some embodiments, the targetsequence is a proto-oncogene or an oncogene.

In one aspect the invention provides for a method of selecting one ormore cell(s) by introducing one or more mutations in a gene in the oneor more cell (s), the method comprising: introducing one or more vectorsinto the cell (s), wherein the one or more vectors drive expression ofone or more of: a CRISPR enzyme, a guide sequence linked to a tracr matesequence, a tracr sequence, and an editing template; wherein the editingtemplate comprises the one or more mutations that abolish CRISPR enzymecleavage; allowing homologous recombination of the editing template withthe target polynucleotide in the cell(s) to be selected; allowing aCRISPR complex to bind to a target polynucleotide to effect cleavage ofthe target polynucleotide within said gene, wherein the CRISPR complexcomprises the CRISPR enzyme complexed with (1) the guide sequence thatis hybridized to the target sequence within the target polynucleotide,and (2) the tracr mate sequence that is hybridized to the tracrsequence, wherein binding of the CRISPR complex to the targetpolynucleotide induces cell death, thereby allowing one or more cell(s)in which one or more mutations have been introduced to be selected. In apreferred embodiment, the CRISPR enzyme is Cas9. In another preferredembodiment of the invention the cell to be selected may be a eukaryoticcell. Aspects of the invention allow for selection of specific cellswithout requiring a selection marker or a two-step process that mayinclude a counter-selection system.

In another aspect the invention comprehends a CRISPR-cas9 (S. pyogenes)system having an X-ray diffraction pattern corresponding to or resultingfrom any or all of the foregoing and/or a crystal having the structuredefined by the co-ordinates of the Crystral Structure Table in Example 1(the CRISPR-cas9 crystal structure).

In a further aspect, the invention involves a computer-assisted methodfor identifying or designing potential compounds to fit within or bindto CRISPR-cas9 system or a functional portion thereof or vice versa (acomputer-assisted method for identifying or designing potentialCRISPR-cas9 systems or a functional portion thereof for binding todesired compounds) or a computer-assisted method for identifying ordesigning potential CRISPR-cas9 systems (e.g., with regard to predictingareas of the CRISPR-cas9 system to be able to be manipulated—forinstance, based on crystral structure data or based on data of cas9orthologs, or with respect to where a functional group such as anactivator or repressor can be attached to the CRISPR-cas9 system, or asto cas9 truncations or as to designing nickases), said methodcomprising:

using a computer system, e.g., a programmed computer comprising aprocessor, a data storage system, an input device, and an output device,the steps of:

(a) inputting into the programmed computer through said input devicedata comprising the three-dimensional co-ordinates of a subset of theatoms from or pertaining to the CRISPR-cas9 crystal structure, e.g., inthe CRISPR-cas9 system binding domain or alternatively or additionallyin domains that vary based on variance among cas9 orthologs or as tocas9s or as to nickases or as to functional groups, optionally withstructural information from CRISPR-cas9 system complex(es), therebygenerating a data set;

(b) comparing, using said processor, said data set to a computerdatabase of structures stored in said computer data storage system,e.g., structures of compounds that bind or putatively bind or that aredesired to bind to a CRISPR-cas9 system or as to cas9 orthologs (e.g.,as cas9s or as to domains or regions that vary amongst cas9 orthologs)or as to the CRISPR-cas9 crystal structure or as to nickases or as tofunctional groups;

(c) selecting from said database, using computer methods,structure(s)—e.g., CRISPR-cas9 structures that may bind to desiredstructures, desired structures that may bind to certain CRISPR-cas9structures, portions of the CRISPR-cas9 system that may be manipulated,e.g., based on data from other portions of the CRISPR-cas9 crystralstructure and/or from cas9 orthologs, truncated cas9s, novel nickases orparticular functional groups, or positions for attaching functionalgroups or functional-group-CRISPR-cas9 systems;

(d) constructing, using computer methods, a model of the selectedstructure(s); and

(e) outputting to said output device the selected structure(s);

and optionally synthesizing one or more of the selected structure(s);

and further optionally testing said synthesized selected structure(s) asor in a CRISPR-cas9 system;

or, said method comprising: providing the co-ordinates of at least twoatoms of the CRISPR-cas9 crystal structure, e.g., at least two atoms ofthe herein Crystral Structure Table of the CRISPR-cas9 crystal structureor co-ordinates of at least a sub-domain of the CRISPR-cas9 crystralstructure (“selected co-ordinates”), providing the structure of acandidate comprising a binding molecule or of portions of theCRISPR-cas9 system that may be manipulated, e.g., based on data fromother portions of the CRISPR-cas9 crystral structure and/or from cas9orthologs, or the structure of functional groups, and fitting thestructure of the candidate to the selected coordinates, to therebyobtain product data comprising CRISPR-cas9 structures that may bind todesired structures, desired structures that may bind to certainCRISPR-cas9 structures, portions of the CRISPR-cas9 system that may bemanipulated, truncated cas9s, novel nickases, or particular functionalgroups, or positions for attaching functional groups orfunctional-group-CRISPR-cas9 systems, with output thereof; andoptionally synthesizing compound(s) from said product data and furtheroptionally comprising testing said synthesized compound(s) as or in aCRISPR-cas9 system.

The testing can comprise analyzing the CRISPR-cas9 system resulting fromsaid synthesized selected structure(s), e.g., with respect to binding,or performing a desired function.

The output in the foregoing methods can comprise data transmission,e.g., transmission of information via telecommunication, telephone,video conference, mass communication, e.g., presentation such as acomputer presentation (eg POWERPOINT), internet, email, documentarycommunication such as a computer program (eg WORD) document and thelike. Accordingly, the invention also comprehends computer readablemedia containing: atomic co-ordinate data according to the hereinCrystal Structure Table and/or the Figures, said data defining the threedimensional structure of CRISPR-cas9 or at least one sub-domain thereof,or structure factor data for CRISPR-cas9, said structure factor databeing derivable from the atomic co-ordinate data of herein CrystalStructure Table and/or the Figures. The computer readable media can alsocontain any data of the foregoing methods. The invention furthercomprehends methods a computer system for generating or performingrational design as in the foregoing methods containing either: atomicco-ordinate data according to herein Crystal Structure Table and/or theFigures, said data defining the three dimensional structure ofCRISPR-cas9 or at least one sub-domain thereof, or structure factor datafor CRISPR-cas9, said structure factor data being derivable from theatomic co-ordinate data of herein Crystal Structure Table and/or theFigures. The invention further comprehends a method of doing businesscomprising providing to a user the computer system or the media or thethree dimensional structure of CRISPR-cas9 or at least one sub-domainthereof, or structure factor data for CRISPR-cas9, said structure setforth in and said structure factor data being derivable from the atomicco-ordinate data of herein Crystal Structure Table and/or the Figures,or the herein computer media or a herein data transmission.

A “binding site” or an “active site” comprises or consists essentiallyof a site (such as an atom, a functional group of an amino acid residueor a plurality of such atoms and/or groups) in a binding cavity orregion, which may bind to a compound such as a nucleic acid molecule,which is/are involved in binding.

By “fitting”, is meant determining by automatic, or semi-automaticmeans, interactions between one or more atoms of a candidate moleculeand at least one atom of a structure of the invention, and calculatingthe extent to which such interactions are stable. Interactions includeattraction and repulsion, brought about by charge, steric considerationsand the like. Various computer-based methods for fitting are describedfurther

By “root mean square (or rms) deviation”, we mean the square root of thearithmetic mean of the squares of the deviations from the mean.

By a “computer system”, is meant the hardware means, software means anddata storage means used to analyze atomic coordinate data. The minimumhardware means of the computer-based systems of the present inventiontypically comprises a central processing unit (CPU), input means, outputmeans and data storage means. Desirably a display or monitor is providedto visualize structure data. The data storage means may be RAM or meansfor accessing computer readable media of the invention. Examples of suchsystems are computer and tablet devices running Unix, Windows or Appleoperating systems.

By “computer readable media”, is meant any medium or media, which can beread and accessed directly or indirectly by a computer e.g. so that themedia is suitable for use in the above-mentioned computer system. Suchmedia include, but are not limited to: magnetic storage media such asfloppy discs, hard disc storage medium and magnetic tape; opticalstorage media such as optical discs or CD-ROM; electrical storage mediasuch as RAM and ROM; thumb drive devices; cloud storage devices andhybrids of these categories such as magnetic/optical storage media.

Accordingly, it is an object of the invention not to encompass withinthe invention any previously known product, process of making theproduct, or method of using the product such that Applicants reserve theright and hereby disclose a disclaimer of any previously known product,process, or method. It is further noted that the invention does notintend to encompass within the scope of the invention any product,process, or making of the product or method of using the product, whichdoes not meet the written description and enablement requirements of theUSPTO (35 U.S.C. § 112, first paragraph) or the EPO (Article 83 of theEPC), such that Applicants reserve the right and hereby disclose adisclaimer of any previously described product, process of making theproduct, or method of using the product.

It is noted that in this disclosure and particularly in the claimsand/or paragraphs, terms such as “comprises”, “comprised”, “comprising”and the like can have the meaning attributed to it in U.S. Patent law;e.g., they can mean “includes”, “included”, “including”, and the like;and that terms such as “consisting essentially of” and “consistsessentially of” have the meaning ascribed to them in U.S. Patent law,e.g., they allow for elements not explicitly recited, but excludeelements that are found in the prior art or that affect a basic or novelcharacteristic of the invention. It may be advantageous in the practiceof the invention to be in compliance with Art. 53(c) EPC and Rule 28(b)and (c) EPC. Nothing herein is to be construed as a promise.

These and other embodiments are disclosed or are obvious from andencompassed by, the following Detailed Description.

BRIEF DESCRIPTION OF THE DRAWINGS

The patent application file contains at least one drawing executed incolor. Copies of this patent or patent application publication withcolor drawing(s) will be provided by the Office upon request and paymentof the necessary fee.

The novel features of the invention are set forth with particularity inthe appended claims. A better understanding of the features andadvantages of the present invention will be obtained by reference to thefollowing detailed description that sets forth illustrative embodiments,in which the principles of the invention are utilized, and theaccompanying drawings of which:

FIGS. 1A-M provide: a diagram showing the topology of the Cas9 protein.Helices are shown as tubes and beta sheets are shown as arrows andvarious views of the CRISPR-cas complex crystal structure (A-I),chimeric RNA architecture from the crystal structure (J-K), aninteraction schematic from the crystal structure (L) and a topologyschematic from the crystal structure (M). FIG. 1J discloses SEQ ID NO:108. FIG. 1L discloses SEQ ID NOS 109 and 110.

FIGS. 2A-C show, from the crystal structure, a schematic of showingcatalytic domains of SpCas9, sites of mutagenesis for new nickases (A),a schematic showing locations of sgRNAs for testing double nicking (B),and results of a Surveyor gel test results showing 1 HNH mutant N854Athat retains nickase activity, and 1 HNH mutant that shows nickaseactivity (N863A), and 2 RuvCIII mutants that show nickase activity(H983A, D986A) (C).

FIGS. 3A-B show Surveyor gel test results of SpCas9 truncation mutantsfrom the crystal structure that retain cleavage activity (A) and a tableshowing the amino acid truncations and flexible (GGGS) (SEQ ID NO: 1) orrigid (A(EAAAK)) (SEQ ID NO: 2) linker substitutions of the lanes of thegels of FIG. 25A (B). FIG. 3B discloses linker sequences as SEQ ID NOS45, 47, 49, 53, 111, 112, 113, 53, 111, 47, and 51, respectively, top tobottom, left to right, in order of appearance.

FIGS. 4A-B show SpCas9 sgRNAs from the crystal structure including thosemutated to investigate contribution to activity of specific bases orgroups to basses. FIG. 4A discloses SEQ ID NOS 114-137, and FIG. 4Bdiscloses SEQ ID NOS 138-152, all respectively, in order of appearance.

FIGS. 5A-C show truncation and creation of chimeric (S. pyogenes) Cas9sbased on the herein crystal structure, including mutants for mappingessential functional domains (A), chimeras that contain regions from S.thermophilus Cas9 (B), and designs for chemically inducible dimerizationof SpCas9 (C).

FIG. 6 shows a picture of Cas9 crystals (0.2 mm).

FIG. 7 shows a structural figure of showing Cas9 in a surfacerepresentation; red, sgRNA; cyan, the guide region of sgRNA; gold,target DNA.

FIG. 8A-D shows the overall structure. (A) Domain organization of S.pyogenes Cas9, and schematic of the sgRNA:target DNA complex. (B) Ribbonrepresentation of the Cas9-sgRNA-DNA complex. Disordered linkers areshown as red dotted lines. (C) Surface representation of theCas9-sgRNA-DNA complex. The active sites of the RuvC (D10A) and HNH(H840A) domains are indicated by dashed yellow circles. (D)Electrostatic surface potential of the Cas9-sgRNA-DNA complex. The HNHdomain is omitted for clarity. Molecular graphic images were preparedusing CueMol (see website at cuemol.org). Also refer to FIGS. 37 and 38.

FIG. 9A-E shows the REC lobe and PI domain. (A) Structure of the REClobe. The REC2 domain and Bridge helix are colored dark gray and green,respectively. The REC1 domain is colored gray, with therepeat-interacting and anti-repeat-interacting regions colored pale blueand pink, respectively. The bound sgRNA:DNA is shown as semi-transparentribbon representation. (B) Schematics indicating positions of SpCas9truncations in the REC1 and REC2 domains. Bars on the right show indelmutations generated by the truncation mutants, measured by SURVEYORassay (n=3, error bars show mean±S.E.M., N.D., not detectable). (C)Western blot showing expression of truncation mutants in HEK 293FTcells. (D) Structure of the PI domain. The bound sgRNA is shown assemi-transparent ribbon representation. (E) Schematics showing wild-typeSpCas9 and St3Cas9, chimeric Cas9, as well as SpCas9 PI domaintruncation constructs. Cas9s are assayed for indel generation at targetsites upstream of either NGG (left bar graph) or NGGNG (right bar graph)PAMs (n=3, error bars show mean±S.E.M., N.D., not detectable). See alsoFIGS. 39-41. FIG. 9E discloses SEQ ID NOS 153 and 107, respectively, inorder of appearance.

FIG. 10A-F shows the NUC lobe. (A) Structure of the RuvC domain. Thecore structure of the RNase H fold core is highlighted in cyan. Theactive-site residues are shown as stick models. (B) Structure of the T.thermophilus RuvC dimer in complex with a Holliday junction (PDB ID4LD0). The two protomers are colored cyan and gray, respectively. (C)Sequence (top) (SEQ ID NO: 154) illustrates Cas9 nicking targets onopposite strands of DNA. Targets 1 and 2 are offset by a distance of4-bp in between. Heatmap (bottom) shows the ability of each catalyticmutant to induce double- (with either sgRNA 1 or 2) or single-strandedbreaks (only with both sgRNA together). Gray boxes: not assayed. (D)Indel formation by Cas9 nickases depends on off-set distance betweensgRNA pairs (right panel). Off-set distance is defined as the number ofbase pairs between the PAM-distal (5′) ends of the guide sequence of agiven sgRNA pair (n=3, error bars show mean±S.E.M., N.D., notdetectable). (E) Structure of the HNH domain. The core structure of theββα-metal fold is highlighted in magenta. The active-site residues areshown as stick models. (F) Structure of the T4 Endo VII dimer in complexwith a Holliday junction (PDB ID 2QNC). The two protomers are coloredpink and gray, respectively, with the ββα-metal fold core highlighted inmagenta. The bound Mg²⁺ ion is shown as an orange sphere.

FIG. 11A-D shows sgRNA and its target DNA. (A) Schematic of thesgRNA:DNA complex. The guide and repeat regions of the crRNA sequenceare colored skyblue and blue, respectively. The tracrRNA sequence iscolored red, with the linker region colored violet. The target DNA andtetraloop are colored yellow and black, respectively. The numbering ofthe 3′ tails of tracrRNA is shown on red background. Watson-Crick andnon-Watson-Crick base pairs are indicated by black and gray lines,respectively. Disordered nucleotides are boxed by dashed lines. (B)Structure of the sgRNA:DNA complex. (C) Structure of therepeat:anti-repeat duplex and three-way junction. Key interactions areshown as gray dashed lines. (D) Effect of sgRNA mutations on ability toinduce indels. Base changes from the +83 sgRNA scaffold are shown atrespective positions, with dashes indicating unaltered bases (n=3, errorbars show mean±S.E.M., p values based on unpaired Student's t-test,N.D., not detectable). See also FIG. 42. FIG. 11A discloses SEQ ID NOS155 and 156, and FIG. 11D discloses SEQ ID NSO 157-170, allrespectively, in order of appearance.

FIG. 12A-K shows Recognition of the sgRNA:DNA. (A) Schematic ofsgRNA:DNA recognition by Cas9. Residues that interact with the sgRNA:DNAvia their main chain are shown in parentheses. (B and C-K) Recognitionof the guide (B), guide:DNA duplex (D), repeat (E), anti-repeat (F),three-way junction (G), stem loop 1 (H), linker (I), stem loop 2 (J) andstem loop 3 (K). Hydrogen bonds and salt bridges are shown as dashedlines. (C) Effect of Cas9 (top) and sgRNA (bottom) mutations on abilityto induce indels (n=3, error bars show mean±S.E.M., p values based onunpaired Student's t-test. N.D., not detectable). FIG. 12A discloses SEQID NOS 171 and 172.

FIG. 13A-D shows Structural flexibility of the complex. (A) Structuralcomparison of Mol A and Mol B. In Mol A (left), disordered linkerbetween the RuvC and HNH domain is indicated by a dotted line. In Mol B(right), the disordered HNH domain is shown as a dashed circle. Theflexible connecting segment (α40 and α41) in the RuvC domain ishighlighted in orange. (B) Superimposition of the Cas9 proteins in Mol Aand Mol B. The two complexes are superimposed based on the core β-sheetof their RuvC domains. The HNH domain and bound sgRNA:DNA are omittedfor clarity. (C) Superimposition of the bound sgRNA:DNA in Mol A and MolB. After superimposition of the two complexes as in (B), the Cas9proteins are omitted to show the sgRNA:DNA. (D) Molecular surface ofCas9. The HNH domain and bound sgRNA:DNA complex are omitted forclarity. Note that there is no direct contact between the REC and NUClobes, expect for the interactions between the a2-α3 loop and β17-β18loop.

FIG. 14 shows a Model of RNA-guided DNA cleavage by Cas9.

FIG. 15 shows Electron density map. The 2mFo-DF_(S) electron density maparound the three-way junction is shown as a gray mesh (contoured at2.5σ).

FIG. 16A-C shows Di-cysteine mutant (C80L/C574E) is functional in HEK293FT cells. (A) Schematic illustrating positions of cysteine mutations(C80L and C574E) in Cas9. (B) Sequence of the target site (SEQ ID NO:173) used to test the function of the C80L/C574E mutant of Cas9. (C)SURVEYOR nuclease assay showing indels generated by either the wild-typeor C80L/C574E mutant (n=3).

FIG. 17 shows a schematic drawing of the secondary structural elementsof Cas9.

FIG. 18A-B shows the sequence alignment of Cas9 orthologs in familiesII-A and II-C (SEQ ID NOS 174-179, respectively, in order ofappearance). The catalytic residues are shown in red triangles. Criticalarginine residues on Bridge helix are shown in green triangles. Thesecondary structure of S. pyogenes Cas9 is shown above the sequences.The figure was prepared using TCoffee (Notredame et al., 2000) andESPript (Gouet et al., 1999). Sp, S. pyogenes; Sm, Streptococcus mutans;St3, Streptococcus thermophilus CRISPR-3; St1, Streptococcusthermophilus CRISPR-1; Cj, Campylobacter jejuni; Mm; Neisseriameningiditis.

FIG. 19 shows the sequence alignment of Cas9 orthologs in families II-A,II-B and II-C. 35 Cas9 orthologs from families IIA, IIB and IIC arealigned (BLOSUM62) and clustered (Jukes-Cantor model Neighbor-Joiningmethod, with S. pyogenes Cas9 as outgroup). Bars on top showconservation by amino acid. In each line, black bars show residues withat least 75% consensus, and gray bars non-conserved residues.

FIG. 20 shows the comparison of the sgRNA:DNA heteroduplex with acanonical A-form RNA duplex. The sgRNA:DNA heteroduplex are superimposedon an A-form RNA duplex based on their phosphorus atoms. The A-form RNAduplex is colored dark gray. Nucleotides 51-97 of the sgRNA are omittedfor clarity.

The figures herein are for illustrative purposes only and are notnecessarily drawn to scale.

DETAILED DESCRIPTION OF THE INVENTION

With respect to general information on CRISPR-Cas Systems, componentsthereof, and delivery of such components, including methods, materials,delivery vehicles, vectors, particles, AAV, and making and usingthereof, including as to amounts and formulations, all useful in thepractice of the instant invention, reference is made to: U.S. Pat. Nos.8,697,359, 8,771,945, 8,795,965, 8,865,406, 8,871,445, 8,889,356,8,889,418 and 8,895,308; US Patent Publications US 2014-0310830 (U.S.application Ser. No. 14/105,031), US 2014-0287938 A1 (U.S. applicationSer. No. 14/213,991), US 2014-0273234 A1 (U.S. application Ser. No.14/293,674), US2014-0273232 A1 (U.S. application Ser. No. 14/290,575),US 2014-0273231 (U.S. application Ser. No. 14/259,420), US 2014-0256046A1 (U.S. application Ser. No. 14/226,274), US 2014-0248702 A1 (U.S.application Ser. No. 14/258,458), US 2014-0242700 A1 (U.S. applicationSer. No. 14/222,930), US 2014-0242699 A1 (U.S. application Ser. No.14/183,512), US 2014-0242664 A1 (U.S. application Ser. No. 14/104,990),US 2014-0234972 A1 (U.S. application Ser. No. 14/183,471), US2014-0227787 A1 (U.S. application Ser. No. 14/256,912), US 2014-0189896A1 (U.S. application Ser. No. 14/105,035), US 2014-0186958 (U.S.application Ser. No. 14/105,017), US 2014-0186919 A1 (U.S. applicationSer. No. 14/104,977), US 2014-0186843 A1 (U.S. application Ser. No.14/104,900), US 2014-0179770 A1 (U.S. application Ser. No. 14/104,837)and US 2014-0179006 A1 (U.S. application Ser. No. 14/183,486), US2014-0170753 (U.S. application Ser. No. 14/183,429); European PatentApplications EP 2 771 468 (EP13818570.7), EP 2 764 103 (EP13824232.6),and EP 2 784 162 (EP14170383.5); and PCT Patent Publications WO2014/093661 (PCT/US2013/074743), WO 2014/093694 (PCT/US2013/074790), WO2014/093595 (PCT/U52013/074611), WO 2014/093718 (PCT/US2013/074825), WO2014/093709 (PCT/US2013/074812), WO 2014/093622 (PCT/US2013/074667), WO2014/093635 (PCT/US2013/074691), WO 2014/093655 (PCT/US2013/074736), WO2014/093712 (PCT/US2013/074819), WO2014/093701 (PCT/U52013/074800), andWO2014/018423 (PCT/US2013/051418). Reference is also made to U.S.provisional patent applications 61/758,468; 61/802,174; 61/806,375;61/814,263; 61/819,803 and 61/828,130, filed on Jan. 30, 2013; Mar. 15,2013; Mar. 28, 2013; Apr. 20, 2013; May 6, 2013 and May 28, 2013respectively. Reference is also made to U.S. provisional patentapplication 61/836,123, filed on Jun. 17, 2013. Reference isadditionally made to U.S. provisional patent applications 61/835,931,61/835,936, 61/836,127, 61/836, 101, 61/836,080 and 61/835,973, eachfiled Jun. 17, 2013. Further reference is made to U.S. provisionalpatent applications 61/862,468 and 61/862,355 filed on Aug. 5, 2013;61/871,301 filed on Aug. 28, 2013; 61/960,777 filed on Sep. 25, 2013 and61/961,980 filed on Oct. 28, 2013. Reference is yet further made to: PCTPatent applications Nos: PCT/US2014/041803, PCT/US2014/041800,PCT/US2014/041809, PCT/US2014/041804 and PCT/US2014/041806, each filedJun. 10, 2014 Jun. 10, 2014; PCT/US2014/041808 filed Jun. 11, 2014; andPCT/US2014/62558 filed Oct. 28, 2014, and U.S. Provisional PatentApplications Ser. Nos. 61/915,150, 61/915,301, 61/915,267 and61/915,260, each filed Dec. 12, 2013; 61/757,972 and 61/768,959, filedon Jan. 29, 2013 and Feb. 25, 2013; 61/835,936, 61/836,127, 61/836,101,61/836,080, 61/835,973, and 61/835,931, filed Jun. 17, 2013; 62/010,888and 62/010,879, both filed Jun. 11, 2014; 62/010,329 and 62/010,441,each filed Jun. 10, 2014; 61/939,228 and 61/939,242, each filed Feb. 12,2014; 61/980,012, filed Apr. 15,2014; 62/038,358, filed Aug. 17, 2014;62/054,490, 62/055,484, 62/055,460 and 62/055,487, each filed Sep. 25,2014; and 62/069,243, filed Oct. 27, 2014. Each of these patents, patentpublications, and applications, and all documents cited therein orduring their prosecution (“appln cited documents”) and all documentscited or referenced in the appln cited documents, together with anyinstructions, descriptions, product specifications, and product sheetsfor any products mentioned therein or in any document therein andincorporated by reference herein, are hereby incorporated herein byreference, and may be employed in the practice of the invention. Alldocuments (e.g., these patents, patent publications and applications andthe appln cited documents) are incorporated herein by reference to thesame extent as if each individual document was specifically andindividually indicated to be incorporated by reference.

Also with respect to general information on CRISPR-Cas Systems, mentionis made of:

-   -   Multiplex genome engineering using CRISPR/Cas systems. Cong, L.,        Ran, F. A., Cox, D., Lin, S., Barretto, R., Habib, N., Hsu, P.        D., Wu, X., Jiang, W., Marraffini, L. A., & Zhang, F. Science        February 15; 339(6121):819-23 (2013);    -   RNA-guided editing of bacterial genomes using CRISPR-Cas        systems. Jiang W., Bikard D., Cox D., Zhang F, Marraffini L A.        Nat Biotechnol March; 31(3):233-9 (2013);    -   One-Step Generation of Mice Carrying Mutations in Multiple Genes        by CRISPR/Cas-Mediated Genome Engineering. Wang H., Yang H.,        Shivalila C S., Dawlaty M M., Cheng A W., Zhang F., Jaenisch R.        Cell May 9; 153(4):910-8 (2013);    -   Optical control of mammalian endogenous transcription and        epigenetic states. Konermann S, Brigham M D, Trevino A E, Hsu P        D, Heidenreich M, Cong L, Platt R J, Scott D A, Church G M,        Zhang F. Nature. 2013 Aug. 22; 500(7463):472-6. doi:        10.1038Nature12466. Epub 2013 Aug. 23;    -   Double Nicking by RNA-Guided CRISPR Cas9 for Enhanced Genome        Editing Specificity. Ran, F A., Hsu, P D., Lin, C Y.,        Gootenberg, J S., Konermann, S., Trevino, A E., Scott, D A.,        Inoue, A., Matoba, S., Zhang, Y., & Zhang, F. Cell August 28.        pii: S0092-8674(13)01015-5. (2013);    -   DNA targeting specificity of RNA-guided Cas9 nucleases. Hsu, P.,        Scott, D., Weinstein, J., Ran, F A., Konermann, S., Agarwala,        V., Li, Y., Fine, E., Wu, X., Shalem, O., Cradick, T J.,        Marraffini, L A., Bao, G., & Zhang, F. Nat Biotechnol        doi:10.1038/nbt.2647 (2013);    -   Genome engineering using the CRISPR-Cas9 system. Ran, F A., Hsu,        P D., Wright, J., Agarwala, V., Scott, D A., Zhang, F. Nature        Protocols November; 8(11):2281-308. (2013);    -   Genome-Scale CRISPR-Cas9 Knockout Screening in Human Cells.        Shalem, O., Sanjana, N E., Hartenian, E., Shi, X., Scott, DA.,        Mikkelson, T., Heckl, D., Ebert, BL., Root, D E., Doench, J G.,        Zhang, F. Science December 12. (2013). [Epub ahead of print];    -   Crystal structure of cas9 in complex with guide RNA and target        DNA. Nishimasu, H., Ran, F A., Hsu, P D., Konermann, S.,        Shehata, S I., Dohmae, N., Ishitani, R., Zhang, F., Nureki, O.        Cell February 27. (2014). 156(5):935-49;    -   Genome-wide binding of the CRISPR endonuclease Cas9 in mammalian        cells. Wu X., Scott D A., Kriz A J., Chiu A C., Hsu P D., Dadon        D B., Cheng A W., Trevino A E., Konermann S., Chen S., Jaenisch        R., Zhang F., Sharp P A. Nat Biotechnol. (2014) April 20. doi:        10.1038/nbt.2889,    -   CRISPR-Cas9 Knockin Mice for Genome Editing and Cancer Modeling,        Platt et al., Cell 159(2): 440-455 (2014) DOI:        10.1016/j.ce11.2014.09.014,    -   Development and Applications of CRISPR-Cas9 for Genome        Engineering, Hsu et al, Cell 157, 1262-1278 (Jun. 5, 2014) (Hsu        2014),    -   Genetic screens in human cells using the CRISPR/Cas9 system,        Wang et al., Science. 2014 Jan. 3; 343(6166): 80-84.        doi:10.1126/science.1246981,    -   Rational design of highly active sgRNAs for CRISPR-Cas9-mediated        gene inactivation, Doench et al., Nature Biotechnology published        online 3 Sep. 2014; doi:10.1038/nbt.3026, and    -   In vivo interrogation of gene function in the mammalian brain        using CRISPR-Cas9, Swiech et al, Nature Biotechnology; published        online 19 Oct. 2014; doi:10.1038/nbt.3055.        each of which is incorporated herein by reference, and discussed        briefly below:

Cong et al. engineered type II CRISPR-Cas systems for use in eukaryoticcells based on both Streptococcus thermophilus Cas9 and alsoStreptoccocus pyogenes Cas9 and demonstrated that Cas9 nucleases can bedirected by short RNAs to induce precise cleavage of DNA in human andmouse cells. Their study further showed that Cas9 as converted into anicking enzyme can be used to facilitate homology-directed repair ineukaryotic cells with minimal mutagenic activity. Additionally, theirstudy demonstrated that multiple guide sequences can be encoded into asingle CRISPR array to enable simultaneous editing of several atendogenous genomic loci sites within the mammalian genome, demonstratingeasy programmability and wide applicability of the RNA-guided nucleasetechnology. This ability to use RNA to program sequence specific DNAcleavage in cells defined a new class of genome engineering tools. Thesestudies further showed that other CRISPR loci are likely to betransplantable into mammalian cells and can also mediate mammaliangenome cleavage. Importantly, it can be envisaged that several aspectsof the CRISPR-Cas system can be further improved to increase itsefficiency and versatility.

Jiang et al. used the clustered, regularly interspaced, shortpalindromic repeats (CRISPR)-associated Cas9 endonuclease complexed withdual-RNAs to introduce precise mutations in the genomes of Streptococcuspneumoniae and Escherichia coli. The approach relied ondual-RNA:Cas9-directed cleavage at the targeted genomic site to killunmutated cells and circumvents the need for selectable markers orcounter-selection systems. The study reported reprogrammingdual-RNA:Cas9 specificity by changing the sequence of short CRISPR RNA(crRNA) to make single- and multinucleotide changes carried on editingtemplates. The study showed that simultaneous use of two crRNAs enabledmultiplex mutagenesis. Furthermore, when the approach was used incombination with recombineering, in S. pneumoniae, nearly 100% of cellsthat were recovered using the described approach contained the desiredmutation, and in E. coli, 65% that were recovered contained themutation.

Konermann et al. addressed the need in the art for versatile and robusttechnologies that enable optical and chemical modulation of DNA-bindingdomains based CRISPR Cas9 enzyme and also Transcriptional Activator LikeEffectors

As discussed in the present specification, the Cas9 nuclease from themicrobial CRISPR-Cas system is targeted to specific genomic loci by a 20nt guide sequence, which can tolerate certain mismatches to the DNAtarget and thereby promote undesired off-target mutagenesis. To addressthis, Ran et al. described an approach that combined a Cas9 nickasemutant with paired guide RNAs to introduce targeted double-strandbreaks. Because individual nicks in the genome are repaired with highfidelity, simultaneous nicking via appropriately offset guide RNAs isrequired for double-stranded breaks and extends the number ofspecifically recognized bases for target cleavage. The authorsdemonstrated that using paired nicking can reduce off-target activity by50- to 1,500-fold in cell lines and to facilitate gene knockout in mousezygotes without sacrificing on-target cleavage efficiency. Thisversatile strategy enables a wide variety of genome editing applicationsthat require high specificity.

Hsu et al. characterized SpCas9 targeting specificity in human cells toinform the selection of target sites and avoid off-target effects. Thestudy evaluated >700 guide RNA variants and SpCas9-induced indelmutation levels at >100 predicted genomic off-target loci in 293T and293FT cells. The authors that SpCas9 tolerates mismatches between guideRNA and target DNA at different positions in a sequence-dependentmanner, sensitive to the number, position and distribution ofmismatches. The authors further showed that SpCas9-mediated cleavage isunaffected by DNA methylation and that the dosage of SpCas9 and sgRNAcan be titrated to minimize off-target modification. Additionally, tofacilitate mammalian genome engineering applications, the authorsreported providing a web-based software tool to guide the selection andvalidation of target sequences as well as off-target analyses.

Ran et al. described a set of tools for Cas9-mediated genome editing vianon-homologous end joining (NHEJ) or homology-directed repair (HDR) inmammalian cells, as well as generation of modified cell lines fordownstream functional studies. To minimize off-target cleavage, theauthors further described a double-nicking strategy using the Cas9nickase mutant with paired guide RNAs. The protocol provided by theauthors experimentally derived guidelines for the selection of targetsites, evaluation of cleavage efficiency and analysis of off-targetactivity. The studies showed that beginning with target design, genemodifications can be achieved within as little as 1-2 weeks, andmodified clonal cell lines can be derived within 2-3 weeks.

Shalem et al. described a new way to interrogate gene function on agenome-wide scale. Their studies showed that delivery of a genome-scaleCRISPR-Cas9 knockout (GeCKO) library targeted 18,080 genes with 64,751unique guide sequences enabled both negative and positive selectionscreening in human cells. First, the authors showed use of the GeCKOlibrary to identify genes essential for cell viability in cancer andpluripotent stem cells. Next, in a melanoma model, the authors screenedfor genes whose loss is involved in resistance to vemurafenib, atherapeutic that inhibits mutant protein kinase BRAF. Their studiesshowed that the highest-ranking candidates included previously validatedgenes NF1 and MED12 as well as novel hits NF2, CUL3, TADA2B, and TADA1.The authors observed a high level of consistency between independentguide RNAs targeting the same gene and a high rate of hit confirmation,and thus demonstrated the promise of genome-scale screening with Cas9.

Nishimasu et al. reported the crystal structure of Streptococcuspyogenes Cas9 in complex with sgRNA and its target DNA at 2.5 A°resolution. The structure revealed a bilobed architecture composed oftarget recognition and nuclease lobes, accommodating the sgRNA:DNAheteroduplex in a positively charged groove at their interface. Whereasthe recognition lobe is essential for binding sgRNA and DNA, thenuclease lobe contains the HNH and RuvC nuclease domains, which areproperly positioned for cleavage of the complementary andnon-complementary strands of the target DNA, respectively. The nucleaselobe also contains a carboxyl-terminal domain responsible for theinteraction with the protospacer adjacent motif (PAM). Thishigh-resolution structure and accompanying functional analyses haverevealed the molecular mechanism of RNA-guided DNA targeting by Cas9,thus paving the way for the rational design of new, versatilegenome-editing technologies.

Wu et al. mapped genome-wide binding sites of a catalytically inactiveCas9 (dCas9) from Streptococcus pyogenes loaded with single guide RNAs(sgRNAs) in mouse embryonic stem cells (mESCs). The authors showed thateach of the four sgRNAs tested targets dCas9 to between tens andthousands of genomic sites, frequently characterized by a 5-nucleotideseed region in the sgRNA and an NGG protospacer adjacent motif (PAM).Chromatin inaccessibility decreases dCas9 binding to other sites withmatching seed sequences; thus 70% of off-target sites are associatedwith genes. The authors showed that targeted sequencing of 295 dCas9binding sites in mESCs transfected with catalytically active Cas9identified only one site mutated above background levels. The authorsproposed a two-state model for Cas9 binding and cleavage, in which aseed match triggers binding but extensive pairing with target DNA isrequired for cleavage.

Hsu 2014 is a review article that discusses generally CRISPR-Cas9history from yogurt to genome editing, including genetic screening ofcells, that is in the information, data and findings of the applicationsin the lineage of this specification filed prior to Jun. 5, 2014. Thegeneral teachings of Hsu 2014 do not involve the specific models,animals of the instant specification.

In general, the CRISPR-Cas or CRISPR system is as used in the foregoingdocuments, such as WO 2014/093622 (PCT/US2013/074667) and referscollectively to transcripts and other elements involved in theexpression of or directing the activity of CRISPR-associated (“Cas”)genes, including sequences encoding a Cas gene, a tracr(trans-activating CRISPR) sequence (e.g. tracrRNA or an active partialtracrRNA), a tracr-mate sequence (encompassing a “direct repeat” and atracrRNA-processed partial direct repeat in the context of an endogenousCRISPR system), a guide sequence (also referred to as a “spacer” in thecontext of an endogenous CRISPR system), or “RNA(s)” as that term isherein used (e.g., RNA(s) to guide Cas9, e.g. CRISPR RNA andtransactivating (tracr) RNA or a single guide RNA (sgRNA) (chimericRNA)) or other sequences and transcripts from a CRISPR locus. Ingeneral, a CRISPR system is characterized by elements that promote theformation of a CRISPR complex at the site of a target sequence (alsoreferred to as a protospacer in the context of an endogenous CRISPRsystem). In the context of formation of a CRISPR complex, “targetsequence” refers to a sequence to which a guide sequence is designed tohave complementarity, where hybridization between a target sequence anda guide sequence promotes the formation of a CRISPR complex. A targetsequence may comprise any polynucleotide, such as DNA or RNApolynucleotides. In some embodiments, a target sequence is located inthe nucleus or cytoplasm of a cell. In some embodiments, direct repeatsmay be identified in silico by searching for repetitive motifs thatfulfill any or all of the following criteria: 1. found in a 2 Kb windowof genomic sequence flanking the type II CRISPR locus; 2. span from 20to 50 bp; and 3. interspaced by 20 to 50 bp. In some embodiments, 2 ofthese criteria may be used, for instance 1 and 2, 2 and 3, or 1 and 3.In some embodiments, all 3 criteria may be used. In some embodiments itmay be preferred in a CRISPR complex that the tracr sequence has one ormore hairpins and is 30 or more nucleotides in length, 40 or morenucleotides in length, or 50 or more nucleotides in length; the guidesequence is between 10 to 30 nucleotides in length, the CRISPR/Casenzyme is a Type II Cas9 enzyme.

In embodiments of the invention the terms guide sequence and guide RNAare used interchangeably as in foregoing cited documents such as WO2014/093622 (PCT/US2013/074667). In general, a guide sequence is anypolynucleotide sequence having sufficient complementarity with a targetpolynucleotide sequence to hybridize with the target sequence and directsequence-specific binding of a CRISPR complex to the target sequence. Insome embodiments, the degree of complementarity between a guide sequenceand its corresponding target sequence, when optimally aligned using asuitable alignment algorithm, is about or more than about 50%, 60%, 75%,80%, 85%, 90%, 95%, 97.5%, 99%, or more. Optimal alignment may bedetermined with the use of any suitable algorithm for aligningsequences, non-limiting example of which include the Smith-Watermanalgorithm, the Needleman-Wunsch algorithm, algorithms based on theBurrows-Wheeler Transform (e.g. the Burrows Wheeler Aligner), ClustalW,Clustal X, BLAT, Novoalign (Novocraft Technologies; available atwww.novocraft.com), ELAND (Illumina, San Diego, Calif.), SOAP (availableat soap.genomics.org.cn), and Maq (available at maq.sourceforge.net). Insome embodiments, a guide sequence is about or more than about 5, 10,11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,29, 30, 35, 40, 45, 50, 75, or more nucleotides in length. In someembodiments, a guide sequence is less than about 75, 50, 45, 40, 35, 30,25, 20, 15, 12, or fewer nucleotides in length. The ability of a guidesequence to direct sequence-specific binding of a CRISPR complex to atarget sequence may be assessed by any suitable assay. For example, thecomponents of a CRISPR system sufficient to form a CRISPR complex,including the guide sequence to be tested, may be provided to a hostcell having the corresponding target sequence, such as by transfectionwith vectors encoding the components of the CRISPR sequence, followed byan assessment of preferential cleavage within the target sequence, suchas by Surveyor assay as described herein. Similarly, cleavage of atarget polynucleotide sequence may be evaluated in a test tube byproviding the target sequence, components of a CRISPR complex, includingthe guide sequence to be tested and a control guide sequence differentfrom the test guide sequence, and comparing binding or rate of cleavageat the target sequence between the test and control guide sequencereactions. Other assays are possible, and will occur to those skilled inthe art.

A guide sequence may be selected to target any target sequence. In someembodiments, the target sequence is a sequence within a genome of acell. Exemplary target sequences include those that are unique in thetarget genome. In some embodiments, a guide sequence is selected toreduce the degree secondary structure within the guide sequence. In someembodiments, about or less than about 75%, 50%, 40%, 30%, 25%, 20%, 15%,10%, 5%, 1%, or fewer of the nucleotides of the guide sequenceparticipate in self-complementary base pairing when optimally folded.Optimal folding may be determined by any suitable polynucleotide foldingalgorithm. Some programs are based on calculating the minimal Gibbs freeenergy. An example of one such algorithm is mFold, as described by Zukerand Stiegler (Nucleic Acids Res. 9 (1981), 133-148). Another examplefolding algorithm is the online webserver RNAfold, developed atInstitute for Theoretical Chemistry at the University of Vienna, usingthe centroid structure prediction algorithm (see e.g. A. R. Gruber etal., 2008, Cell 106(1): 23-24; and P A Carr and G M Church, 2009, NatureBiotechnology 27(12): 1151-62).

In general, a tracr mate sequence includes any sequence that hassufficient complementarity with a tracr sequence to promote one or moreof: (1) excision of a guide sequence flanked by tracr mate sequences ina cell containing the corresponding tracr sequence; and (2) formation ofa CRISPR complex at a target sequence, wherein the CRISPR complexcomprises the tracr mate sequence hybridized to the tracr sequence. Ingeneral, degree of complementarity is with reference to the optimalalignment of the tracr mate sequence and tracr sequence, along thelength of the shorter of the two sequences. Optimal alignment may bedetermined by any suitable alignment algorithm, and may further accountfor secondary structures, such as self-complementarity within either thetracr sequence or tracr mate sequence. In some embodiments, the degreeof complementarity between the tracr sequence and tracr mate sequencealong the length of the shorter of the two when optimally aligned isabout or more than about 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%,97.5%, 99%, or higher. In some embodiments, the tracr sequence is aboutor more than about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,19, 20, 25, 30, 40, 50, or more nucleotides in length. In someembodiments, the tracr sequence and tracr mate sequence are containedwithin a single transcript, such that hybridization between the twoproduces a transcript having a secondary structure, such as a hairpin.In an embodiment of the invention, the transcript or transcribedpolynucleotide sequence has at least two or more hairpins. In preferredembodiments, the transcript has two, three, four or five hairpins. In afurther embodiment of the invention, the transcript has at most fivehairpins. In a hairpin structure the portion of the sequence 5′ of thefinal “N” and upstream of the loop corresponds to the tracr matesequence, and the portion of the sequence 3′ of the loop corresponds tothe tracr sequence Further non-limiting examples of singlepolynucleotides comprising a guide sequence, a tracr mate sequence, anda tracr sequence are as follows (listed 5′ to 3′), where “N” representsa base of a guide sequence, the first block of lower case lettersrepresent the tracr mate sequence, and the second block of lower caseletters represent the tracr sequence, and the final poly-T sequencerepresents the transcription terminator: (1)NNNNNNNNNNNNNNNNNNNNgtttttgtactctcaagatttaGAAAtaaatcttgcagaagctacaaagataaggcttcatgccgaaatcaacaccctgtcattttatggcagggtgttttcgttatttaaTTTTTT (SEQ IDNO: 3); (2)NNNNNNNNNNNNNNNNNNNNgtttttgtactctcaGAAAtgcagaagctacaaagataaggcttcatgccgaaatcaacaccctgtcattttatggcagggtgttttcgttatttaaTTTTTT (SEQ ID NO: 4); (3)NNNNNNNNNNNNNNNNNNNNgtttttgtactctcaGAAAtgcagaagctacaaagataaggcttcatgccgaaatcaacaccctgtcattttatggcagggtgtTTTTTT (SEQ ID NO: 5); (4)NNNNNNNNNNNNNNNNNNNNgttttagagctaGAAAtagcaagttaaaataaggctagtccgttatcaacttgaaaaagtggcaccgagtcggtgcTTTTTT (SEQ ID NO: 6); (5)NNNNNNNNNNNNNNNNNNNNgttttagagctaGAAATAGcaagttaaaataaggctagtccgttatcaacttgaaaaagtgTTTTTTT (SEQ ID NO: 7); and (6)NNNNNNNNNNNNNNNNNNNNgttttagagctagAAATAGcaagttaaaataaggctagtccgttatcaTTTTTTTT (SEQ ID NO: 8). In some embodiments, sequences (1) to (3) areused in combination with Cas9 from S. thermophilus CRISPR1. In someembodiments, sequences (4) to (6) are used in combination with Cas9 fromS. pyogenes. In some embodiments, the tracr sequence is a separatetranscript from a transcript comprising the tracr mate sequence.

In some embodiments, candidate tracrRNA may be subsequently predicted bysequences that fulfill any or all of the following criteria: 1. sequencehomology to direct repeats (motif search in Geneious with up to 18-bpmismatches); 2. presence of a predicted Rho-independent transcriptionalterminator in direction of transcription; and 3. stable hairpinsecondary structure between tracrRNA and direct repeat. In someembodiments, 2 of these criteria may be used, for instance 1 and 2, 2and 3, or 1 and 3. In some embodiments, all 3 criteria may be used.

In some embodiments, chimeric synthetic guide RNAs (sgRNAs) designs mayincorporate at least 12 bp of duplex structure between the direct repeatand tracrRNA.

For minimization of toxicity and off-target effect, it will be importantto control the concentration of CRISPR enzyme mRNA and guide RNAdelivered. Optimal concentrations of CRISPR enzyme mRNA and guide RNAcan be determined by testing different concentrations in a cellular ornon-human eukaryote animal model and using deep sequencing the analyzethe extent of modification at potential off-target genomic loci. Forexample, for the guide sequence targeting 5′-GAGTCCGAGCAGAAGAAGAA-3′(SEQ ID NO: 9) in the EMX1 gene of the human genome, deep sequencing canbe used to assess the level of modification at the following twooff-target loci, 1: 5′-GAGTCCTAGCAGGAGAAGAA-3′ (SEQ ID NO: 10) and 2:5′-GAGTCTAAGCAGAAGAAGAA-3′ (SEQ ID NO: 11). The concentration that givesthe highest level of on-target modification while minimizing the levelof off-target modification should be chosen for in vivo delivery.Alternatively, to minimize the level of toxicity and off-target effect,CRISPR enzyme nickase mRNA (for example S. pyogenes Cas9 with the D10Amutation) can be delivered with a pair of guide RNAs targeting a site ofinterest. The two guide RNAs need to be spaced as follows. Guidesequences and strategies to mimize toxicity and off-target effects canbe as in WO 2014/093622 (PCT/US2013/074667).

The CRISPR system is derived advantageously from a type II CRISPRsystem. In some embodiments, one or more elements of a CRISPR system isderived from a particular organism comprising an endogenous CRISPRsystem, such as Streptococcus pyogenes. In preferred embodiments of theinvention, the CRISPR system is a type II CRISPR system and the Casenzyme is Cas9, which catalyzes DNA cleavage. Non-limiting examples ofCas proteins include Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cash, Cas7,Cas8, Cas9 (also known as Csn1 and Csx12), Cas10, Csy1, Csy2, Csy3,Cse1, Cse2, Csc1, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1,Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx10, Csx16,CsaX, Csx3, Csx1, Csx15, Csf1, Csf2, Csf3, Csf4, homologues thereof, ormodified versions thereof.

In some embodiments, the unmodified CRISPR enzyme has DNA cleavageactivity, such as Cas9. In some embodiments, the CRISPR enzyme directscleavage of one or both strands at the location of a target sequence,such as within the target sequence and/or within the complement of thetarget sequence. In some embodiments, the CRISPR enzyme directs cleavageof one or both strands within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15,20, 25, 50, 100, 200, 500, or more base pairs from the first or lastnucleotide of a target sequence. In some embodiments, a vector encodes aCRISPR enzyme that is mutated to with respect to a correspondingwild-type enzyme such that the mutated CRISPR enzyme lacks the abilityto cleave one or both strands of a target polynucleotide containing atarget sequence. For example, an aspartate-to-alanine substitution(D10A) in the RuvC I catalytic domain of Cas9 from S. pyogenes convertsCas9 from a nuclease that cleaves both strands to a nickase (cleaves asingle strand). Other examples of mutations that render Cas9 a nickaseinclude, without limitation, H840A, N854A, and N863A. As a furtherexample, two or more catalytic domains of Cas9 (RuvC I, RuvC II, andRuvC III or the HNH domain) may be mutated to produce a mutated Cas9substantially lacking all DNA cleavage activity. In some embodiments, aD 10A mutation is combined with one or more of H840A, N854A, or N863Amutations to produce a Cas9 enzyme substantially lacking all DNAcleavage activity. In some embodiments, a CRISPR enzyme is considered tosubstantially lack all DNA cleavage activity when the DNA cleavageactivity of the mutated enzyme is about no more than 25%, 10%, 5%, 1%,0.1%, 0.01%, or less of the DNA cleavage activity of the non-mutatedform of the enzyme; an example can be when the DNA cleavage activity ofthe mutated form is nil or negligible as compared with the non-mutatedform. Where the enzyme is not SpCas9, mutations may be made at any orall residues corresponding to positions 10, 762, 840, 854, 863 and/or986 of SpCas9 (which may be ascertained for instance by standardsequence comparison tools). In particular, any or all of the followingmutations are preferred in SpCas9: D10A, E762A, H840A, N854A, N863Aand/or D986A; as well as conservative substitution for any of thereplacement amino acids is also envisaged. The same (or conservativesubstitutions of these mutations) at corresponding positions in otherCas9s are also preferred. Particularly preferred are D10 and H840 inSpCas9. However, in other Cas9s, residues corresponding to SpCas9 D10and H840 are also preferred. Orthologs of SpCas9 can be used in thepractice of the invention. A Cas enzyme may be identified Cas9 as thiscan refer to the general class of enzymes that share homology to thebiggest nuclease with multiple nuclease domains from the type II CRISPRsystem. Most preferably, the Cas9 enzyme is from, or is derived from,spCas9 (S. pyogenes Cas9) or saCas9 (S. aureus Cas9). StCas9″ refers towild type Cas9 from S. thermophilus, the protein sequence of which isgiven in the SwissProt database under accession number G3ECR1.Similarly, S. pyogenes Cas9 or spCas9 is included in SwissProt underaccession number Q99ZW2. By derived, Applicants mean that the derivedenzyme is largely based, in the sense of having a high degree ofsequence homology with, a wildtype enzyme, but that it has been mutated(modified) in some way as described herein. It will be appreciated thatthe terms Cas and CRISPR enzyme are generally used hereininterchangeably, unless otherwise apparent. As mentioned above, many ofthe residue numberings used herein refer to the Cas9 enzyme from thetype II CRISPR locus in Streptococcus pyogenes. However, it will beappreciated that this invention includes many more Cas9s from otherspecies of microbes, such as SpCas9, SaCa9, St1Cas9 and so forth.Enzymatic action by Cas9 derived from Streptococcus pyogenes or anyclosely related Cas9 generates double stranded breaks at target sitesequences which hybridize to 20 nucleotides of the guide sequence andthat have a protospacer-adjacent motif (PAM) sequence (examples includeNGG/NRG or a PAM that can be determined as described herein) followingthe 20 nucleotides of the target sequence. CRISPR activity through Cas9for site-specific DNA recognition and cleavage is defined by the guidesequence, the tracr sequence that hybridizes in part to the guidesequence and the PAM sequence. More aspects of the CRISPR system aredescribed in Karginov and Hannon, The CRISPR system: small RNA-guideddefence in bacteria and archaea, Mole Cell 2010, Jan. 15; 37(1): 7. Thetype II CRISPR locus from Streptococcus pyogenes SF370, which contains acluster of four genes Cas9, Cas1, Cas2, and Csn1, as well as twonon-coding RNA elements, tracrRNA and a characteristic array ofrepetitive sequences (direct repeats) interspaced by short stretches ofnon-repetitive sequences (spacers, about 30 bp each). In this system,targeted DNA double-strand break (DSB) is generated in four sequentialsteps. First, two non-coding RNAs, the pre-crRNA array and tracrRNA, aretranscribed from the CRISPR locus. Second, tracrRNA hybridizes to thedirect repeats of pre-crRNA, which is then processed into mature crRNAscontaining individual spacer sequences. Third, the mature crRNA:tracrRNAcomplex directs Cas9 to the DNA target consisting of the protospacer andthe corresponding PAM via heteroduplex formation between the spacerregion of the crRNA and the protospacer DNA. Finally, Cas9 mediatescleavage of target DNA upstream of PAM to create a DSB within theprotospacer. A pre-crRNA array consisting of a single spacer flanked bytwo direct repeats (DRs) is also encompassed by the term “tracr-matesequences”). In certain embodiments, Cas9 may be constitutively presentor inducibly present or conditionally present or administered ordelivered. Cas9 optimization may be used to enhance function or todevelop new functions, one can generate chimeric Cas9 proteins. And Cas9may be used as a generic DNA binding protein.

With respect to mutations of the CRISPR enzyme, when the enzyme is notSpCas9, mutations may be made at any or all residues corresponding topositions 10, 762, 840, 854, 863 and/or 986 of SpCas9 (which may beascertained for instance by standard sequence comparison tools). Inparticular, any or all of the following mutations are preferred inSpCas9: D10A, E762A, H840A, N854A, N863A and/or D986A; as well asconservative substitution for any of the replacement amino acids is alsoenvisaged. In an aspect the invention provides as to any or each or allembodiments herein-discussed wherein the CRISPR enzyme comprises atleast one or more, or at least two or more mutations, wherein the atleast one or more mutation or the at least two or more mutations is asto D10, E762, H840, N854, N863, or D986 according to SpCas9 protein,e.g., D10A, E762A, H840A, N854A, N863A and/or D986A as to SpCas9, orN580 according to SaCas9, e.g., N580A as to SaCas9, or any correspondingmutation(s) in a Cas9 of an ortholog to Sp or Sa, or the CRISPR enzymecomprises at least one mutation wherein at least H840 or N863A as to SpCas9 or N580A as to Sa Cas9 is mutated; e.g., wherein the CRISPR enzymecomprises H840A, or D10A and H840A, or D10A and N863A, according toSpCas9 protein, or any corresponding mutation(s) in a Cas9 of anortholog to Sp protein or Sa protein.

Typically, in the context of an endogenous CRISPR system, formation of aCRISPR complex (comprising a guide sequence hybridized to a targetsequence and complexed with one or more Cas proteins) results incleavage of one or both strands in or near (e.g. within 1, 2, 3, 4, 5,6, 7, 8, 9, 10, 20, 50, or more base pairs from) the target sequence.Without wishing to be bound by theory, the tracr sequence, which maycomprise or consist of all or a portion of a wild-type tracr sequence(e.g. about or more than about 20, 26, 32, 45, 48, 54, 63, 67, 85, ormore nucleotides of a wild-type tracr sequence), may also form part of aCRISPR complex, such as by hybridization along at least a portion of thetracr sequence to all or a portion of a tracr mate sequence that isoperably linked to the guide sequence.

An example of a codon optimized sequence, is in this instance a sequenceoptimized for expression in a eukaryote, e.g., humans (i.e. beingoptimized for expression in humans), or for another eukaryote, animal ormammal as herein discussed; see, e.g., SaCas9 human codon optimizedsequence in WO 2014/093622 (PCT/US2013/074667). Whilst this ispreferred, it will be appreciated that other examples are possible andcodon optimization for a host species other than human, or for codonoptimization for specific organs is known. In some embodiments, anenzyme coding sequence encoding a CRISPR enzyme is codon optimized forexpression in particular cells, such as eukaryotic cells. The eukaryoticcells may be those of or derived from a particular organism, such as amammal, including but not limited to human, or non-human eukaryote oranimal or mammal as herein discussed, e.g., mouse, rat, rabbit, dog,livestock, or non-human mammal or primate. In some embodiments,processes for modifying the germ line genetic identity of human beingsand/or processes for modifying the genetic identity of animals which arelikely to cause them suffering without any substantial medical benefitto man or animal, and also animals resulting from such processes, may beexcluded. In general, codon optimization refers to a process ofmodifying a nucleic acid sequence for enhanced expression in the hostcells of interest by replacing at least one codon (e.g. about or morethan about 1, 2, 3, 4, 5, 10, 15, 20, 25, 50, or more codons) of thenative sequence with codons that are more frequently or most frequentlyused in the genes of that host cell while maintaining the native aminoacid sequence. Various species exhibit particular bias for certaincodons of a particular amino acid. Codon bias (differences in codonusage between organisms) often correlates with the efficiency oftranslation of messenger RNA (mRNA), which is in turn believed to bedependent on, among other things, the properties of the codons beingtranslated and the availability of particular transfer RNA (tRNA)molecules. The predominance of selected tRNAs in a cell is generally areflection of the codons used most frequently in peptide synthesis.Accordingly, genes can be tailored for optimal gene expression in agiven organism based on codon optimization. Codon usage tables arereadily available, for example, at the “Codon Usage Database” availableat www.kazusa.orjp/codon/ and these tables can be adapted in a number ofways. See Nakamura, Y., et al. “Codon usage tabulated from theinternational DNA sequence databases: status for the year 2000” Nucl.Acids Res. 28:292 (2000). Computer algorithms for codon optimizing aparticular sequence for expression in a particular host cell are alsoavailable, such as Gene Forge (Aptagen; Jacobus, Pa.), are alsoavailable. In some embodiments, one or more codons (e.g. 1, 2, 3, 4, 5,10, 15, 20, 25, 50, or more, or all codons) in a sequence encoding aCRISPR enzyme correspond to the most frequently used codon for aparticular amino acid.

In some embodiments, a vector encodes a CRISPR enzyme comprising one ormore nuclear localization sequences (NLSs), such as about or more thanabout 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs. In some embodiments,the CRISPR enzyme comprises about or more than about 1, 2, 3, 4, 5, 6,7, 8, 9, 10, or more NLSs at or near the amino-terminus, about or morethan about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs at or near thecarboxy-terminus, or a combination of these (e.g. zero or at least oneor more NLS at the amino-terminus and zero or at one or more NLS at thecarboxy terminus). When more than one NLS is present, each may beselected independently of the others, such that a single NLS may bepresent in more than one copy and/or in combination with one or moreother NLSs present in one or more copies. In a preferred embodiment ofthe invention, the CRISPR enzyme comprises at most 6 NLSs. In someembodiments, an NLS is considered near the N- or C-terminus when thenearest amino acid of the NLS is within about 1, 2, 3, 4, 5, 10, 15, 20,25, 30, 40, 50, or more amino acids along the polypeptide chain from theN- or C-terminus. Non-limiting examples of NLSs include an NLS sequencederived from: the NLS of the SV40 virus large T-antigen, having theamino acid sequence PKKKRKV (SEQ ID NO: 12); the NLS from nucleoplasmin(e.g. the nucleoplasmin bipartite NLS with the sequence KRPAATKKAGQAKKKK(SEQ ID NO: 13)); the c-myc NLS having the amino acid sequence PAAKRVKLD(SEQ ID NO: 14) or RQRRNELKRSP (SEQ ID NO: 15); the hRNPA1 M9 NLS havingthe sequence NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 16); thesequence RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV (SEQ ID NO: 17) ofthe IBB domain from importin-alpha; the sequences VSRKRPRP (SEQ ID NO:18) and PPKKARED (SEQ ID NO: 19) of the myoma T protein; the sequencePQPKKKPL (SEQ ID NO: 20) of human p53; the sequence SALIKKKKKMAP (SEQ IDNO: 21) of mouse c-abl IV; the sequences DRLRR (SEQ ID NO: 22) andPKQKKRK (SEQ ID NO: 23) of the influenza virus NS1; the sequenceRKLKKKIKKL (SEQ ID NO: 24) of the Hepatitis virus delta antigen; thesequence REKKKFLKRR (SEQ ID NO: 25) of the mouse Mx1 protein; thesequence KRKGDEVDGVDEVAKKKSKK (SEQ ID NO: 26) of the humanpoly(ADP-ribose) polymerase; and the sequence RKCLQAGMNLEARKTKK (SEQ IDNO: 27) of the steroid hormone receptors (human) glucocorticoid. Ingeneral, the one or more NLSs are of sufficient strength to driveaccumulation of the CRISPR enzyme in a detectable amount in the nucleusof a eukaryotic cell. In general, strength of nuclear localizationactivity may derive from the number of NLSs in the CRISPR enzyme, theparticular NLS(s) used, or a combination of these factors. Detection ofaccumulation in the nucleus may be performed by any suitable technique.For example, a detectable marker may be fused to the CRISPR enzyme, suchthat location within a cell may be visualized, such as in combinationwith a means for detecting the location of the nucleus (e.g. a stainspecific for the nucleus such as DAPI). Cell nuclei may also be isolatedfrom cells, the contents of which may then be analyzed by any suitableprocess for detecting protein, such as immunohistochemistry, Westernblot, or enzyme activity assay. Accumulation in the nucleus may also bedetermined indirectly, such as by an assay for the effect of CRISPRcomplex formation (e.g. assay for DNA cleavage or mutation at the targetsequence, or assay for altered gene expression activity affected byCRISPR complex formation and/or CRISPR enzyme activity), as compared toa control no exposed to the CRISPR enzyme or complex, or exposed to aCRISPR enzyme lacking the one or more NLSs.

Aspects of the invention relate to the expression of the gene productbeing decreased or a template polynucleotide being further introducedinto the DNA molecule encoding the gene product or an interveningsequence being excised precisely by allowing the two 5′ overhangs toreanneal and ligate or the activity or function of the gene productbeing altered or the expression of the gene product being increased. Inan embodiment of the invention, the gene product is a protein. sgRNApairs creating 5′ overhangs with less than 8bp overlap between the guidesequences (offset greater than −8 bp) were able to mediate detectableindel formation. Importantly, each guide used in these assays is able toefficiently induce indels when paired with wildtype Cas9, indicatingthat the relative positions of the guide pairs are the most importantparameters in predicting double nicking activity. Since Cas9n andCas9H840A nick opposite strands of DNA, substitution of Cas9n withCas9H840A with a given sgRNA pair should have resulted in the inversionof the overhang type; but no indel formation is observed as withCas9H840A indicating that Cas9H840A is a CRISPR enzyme substantiallylacking all DNA cleavage activity (which is when the DNA cleavageactivity of the mutated enzyme is about no more than 25%, 10%, 5%, 1%,0.1%, 0.01%, or less of the DNA cleavage activity of the non-mutatedform of the enzyme; whereby an example can be when the DNA cleavageactivity of the mutated form is nil or negligible as compared with thenon-mutated form, e.g., when no indel formation is observed as withCas9H840A in the eukaryotic system in contrast to the biochemical orprokaryotic systems). Nonetheless, a pair of sgRNAs that will generate a5′ overhang with Cas9n should in principle generate the corresponding 3′overhang instead, and double nicking. Therefore, sgRNA pairs that leadto the generation of a 3′ overhang with Cas9n can be used with anothermutated Cas9 to generate a 5′ overhang, and double nicking. Accordingly,in some embodiments, a recombination template is also provided. Arecombination template may be a component of another vector as describedherein, contained in a separate vector, or provided as a separatepolynucleotide. In some embodiments, a recombination template isdesigned to serve as a template in homologous recombination, such aswithin or near a target sequence nicked or cleaved by a CRISPR enzyme asa part of a CRISPR complex. A template polynucleotide may be of anysuitable length, such as about or more than about 10, 15, 20, 25, 50,75, 100, 150, 200, 500, 1000, or more nucleotides in length. In someembodiments, the template polynucleotide is complementary to a portionof a polynucleotide comprising the target sequence. When optimallyaligned, a template polynucleotide might overlap with one or morenucleotides of a target sequences (e.g. about or more than about 1, 5,10, 15, 20, or more nucleotides). In some embodiments, when a templatesequence and a polynucleotide comprising a target sequence are optimallyaligned, the nearest nucleotide of the template polynucleotide is withinabout 1, 5, 10, 15, 20, 25, 50, 75, 100, 200, 300, 400, 500, 1000, 5000,10000, or more nucleotides from the target sequence.

In some embodiments, one or more vectors driving expression of one ormore elements of a CRISPR system are introduced into a host cell suchthat expression of the elements of the CRISPR system direct formation ofa CRISPR complex at one or more target sites. For example, a Cas enzyme,a guide sequence linked to a tracr-mate sequence, and a tracr sequencecould each be operably linked to separate regulatory elements onseparate vectors. Or, RNA(s) of the CRISPR System can be delivered to atransgenic Cas9 animal or mammal, e.g., an animal or mammal thatconstitutively or inducibly or conditionally expresses Cas9; or ananimal or mammal that is otherwise expressing Cas9 or has cellscontaining Cas9, such as by way of prior administration thereto of avector or vectors that code for and express in vivo Cas9. Alternatively,two or more of the elements expressed from the same or differentregulatory elements, may be combined in a single vector, with one ormore additional vectors providing any components of the CRISPR systemnot included in the first vector. CRISPR system elements that arecombined in a single vector may be arranged in any suitable orientation,such as one element located 5′ with respect to (“upstream” of) or 3′with respect to (“downstream” of) a second element. The coding sequenceof one element may be located on the same or opposite strand of thecoding sequence of a second element, and oriented in the same oropposite direction. In some embodiments, a single promoter drivesexpression of a transcript encoding a CRISPR enzyme and one or more ofthe guide sequence, tracr mate sequence (optionally operably linked tothe guide sequence), and a tracr sequence embedded within one or moreintron sequences (e.g. each in a different intron, two or more in atleast one intron, or all in a single intron). In some embodiments, theCRISPR enzyme, guide sequence, tracr mate sequence, and tracr sequenceare operably linked to and expressed from the same promoter. Deliveryvehicles, vectors, particles, nanoparticles, formulations and componentsthereof for expression of one or more elements of a CRISPR system are asused in the foregoing documents, such as WO 2014/093622(PCT/US2013/074667). In some embodiments, a vector comprises one or moreinsertion sites, such as a restriction endonuclease recognition sequence(also referred to as a “cloning site”). In some embodiments, one or moreinsertion sites (e.g. about or more than about 1, 2, 3, 4, 5, 6, 7, 8,9, 10, or more insertion sites) are located upstream and/or downstreamof one or more sequence elements of one or more vectors. In someembodiments, a vector comprises an insertion site upstream of a tracrmate sequence, and optionally downstream of a regulatory elementoperably linked to the tracr mate sequence, such that followinginsertion of a guide sequence into the insertion site and uponexpression the guide sequence directs sequence-specific binding of aCRISPR complex to a target sequence in a eukaryotic cell. In someembodiments, a vector comprises two or more insertion sites, eachinsertion site being located between two tracr mate sequences so as toallow insertion of a guide sequence at each site. In such anarrangement, the two or more guide sequences may comprise two or morecopies of a single guide sequence, two or more different guidesequences, or combinations of these. When multiple different guidesequences are used, a single expression construct may be used to targetCRISPR activity to multiple different, corresponding target sequenceswithin a cell. For example, a single vector may comprise about or morethan about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, or more guidesequences. In some embodiments, about or more than about 1, 2, 3, 4, 5,6, 7, 8, 9, 10, or more such guide-sequence-containing vectors may beprovided, and optionally delivered to a cell. In some embodiments, avector comprises a regulatory element operably linked to anenzyme-coding sequence encoding a CRISPR enzyme, such as a Cas protein.CRISPR enzyme or CRISPR enzyme mRNA or CRISPR guide RNA or RNA(s) can bedelivered separately; and advantageously at least one of these isdelivered via a nanoparticle complex. CRISPR enzyme mRNA can bedelivered prior to the guide RNA to give time for CRISPR enzyme to beexpressed. CRISPR enzyme mRNA might be administered 1-12 hours(preferably around 2-6 hours) prior to the administration of guide RNA.Alternatively, CRISPR enzyme mRNA and guide RNA can be administeredtogether. Advantageously, a second booster dose of guide RNA can beadministered 1-12 hours (preferably around 2-6 hours) after the initialadministration of CRISPR enzyme mRNA+guide RNA. Additionaladministrations of CRISPR enzyme mRNA and/or guide RNA might be usefulto achieve the most efficient levels of genome modification.

In one aspect, the invention provides methods for using one or moreelements of a CRISPR system. The CRISPR complex of the inventionprovides an effective means for modifying a target polynucleotide. TheCRISPR complex of the invention has a wide variety of utility includingmodifying (e.g., deleting, inserting, translocating, inactivating,activating) a target polynucleotide in a multiplicity of cell types. Assuch the CRISPR complex of the invention has a broad spectrum ofapplications in, e.g., gene therapy, drug screening, disease diagnosis,and prognosis. An exemplary CRISPR complex comprises a CRISPR enzymecomplexed with a guide sequence hybridized to a target sequence withinthe target polynucleotide. The guide sequence is linked to a tracr matesequence, which in turn hybridizes to a tracr sequence. In oneembodiment, this invention provides a method of cleaving a targetpolynucleotide. The method comprises modifying a target polynucleotideusing a CRISPR complex that binds to the target polynucleotide andeffect cleavage of said target polynucleotide. Typically, the CRISPRcomplex of the invention, when introduced into a cell, creates a break(e.g., a single or a double strand break) in the genome sequence. Forexample, the method can be used to cleave a disease gene in a cell. Thebreak created by the CRISPR complex can be repaired by a repairprocesses such as the error prone non-homologous end joining (NHEJ)pathway or the high fidelity homology-directed repair (HDR). Duringthese repair process, an exogenous polynucleotide template can beintroduced into the genome sequence. In some methods, the HDR process isused modify genome sequence. For example, an exogenous polynucleotidetemplate comprising a sequence to be integrated flanked by an upstreamsequence and a downstream sequence is introduced into a cell. Theupstream and downstream sequences share sequence similarity with eitherside of the site of integration in the chromosome. Where desired, adonor polynucleotide can be DNA, e.g., a DNA plasmid, a bacterialartificial chromosome (BAC), a yeast artificial chromosome (YAC), aviral vector, a linear piece of DNA, a PCR fragment, a naked nucleicacid, or a nucleic acid complexed with a delivery vehicle such as aliposome or poloxamer. The exogenous polynucleotide template comprises asequence to be integrated (e.g., a mutated gene). The sequence forintegration may be a sequence endogenous or exogenous to the cell.Examples of a sequence to be integrated include polynucleotides encodinga protein or a non-coding RNA (e.g., a microRNA). Thus, the sequence forintegration may be operably linked to an appropriate control sequence orsequences. Alternatively, the sequence to be integrated may provide aregulatory function. The upstream and downstream sequences in theexogenous polynucleotide template are selected to promote recombinationbetween the chromosomal sequence of interest and the donorpolynucleotide. The upstream sequence is a nucleic acid sequence thatshares sequence similarity with the genome sequence upstream of thetargeted site for integration. Similarly, the downstream sequence is anucleic acid sequence that shares sequence similarity with thechromosomal sequence downstream of the targeted site of integration. Theupstream and downstream sequences in the exogenous polynucleotidetemplate can have 75%, 80%, 85%, 90%, 95%, or 100% sequence identitywith the targeted genome sequence. Preferably, the upstream anddownstream sequences in the exogenous polynucleotide template have about95%, 96%, 97%, 98%, 99%, or 100% sequence identity with the targetedgenome sequence. In some methods, the upstream and downstream sequencesin the exogenous polynucleotide template have about 99% or 100% sequenceidentity with the targeted genome sequence. An upstream or downstreamsequence may comprise from about 20 bp to about 2500 bp, for example,about 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200,1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400,or 2500 bp. In some methods, the exemplary upstream or downstreamsequence have about 200 bp to about 2000 bp, about 600 bp to about 1000bp, or more particularly about 700 bp to about 1000 bp. In some methods,the exogenous polynucleotide template may further comprise a marker.Such a marker may make it easy to screen for targeted integrations.Examples of suitable markers include restriction sites, fluorescentproteins, or selectable markers. The exogenous polynucleotide templateof the invention can be constructed using recombinant techniques (see,for example, Sambrook et al., 2001 and Ausubel et al., 1996). In amethod for modifying a target polynucleotide by integrating an exogenouspolynucleotide template, a double stranded break is introduced into thegenome sequence by the CRISPR complex, the break is repaired viahomologous recombination an exogenous polynucleotide template such thatthe template is integrated into the genome. The presence of adouble-stranded break facilitates integration of the template. In otherembodiments, this invention provides a method of modifying expression ofa polynucleotide in a eukaryotic cell. The method comprises increasingor decreasing expression of a target polynucleotide by using a CRISPRcomplex that binds to the polynucleotide. In some methods, a targetpolynucleotide can be inactivated to effect the modification of theexpression in a cell. For example, upon the binding of a CRISPR complexto a target sequence in a cell, the target polynucleotide is inactivatedsuch that the sequence is not transcribed, the coded protein is notproduced, or the sequence does not function as the wild-type sequencedoes. For example, a protein or microRNA coding sequence may beinactivated such that the protein or microRNA or pre-microRNA transcriptis not produced. In some methods, a control sequence can be inactivatedsuch that it no longer functions as a control sequence. As used herein,“control sequence” refers to any nucleic acid sequence that effects thetranscription, translation, or accessibility of a nucleic acid sequence.Examples of a control sequence include, a promoter, a transcriptionterminator, and an enhancer are control sequences. The targetpolynucleotide of a CRISPR complex can be any polynucleotide endogenousor exogenous to the eukaryotic cell. For example, the targetpolynucleotide can be a polynucleotide residing in the nucleus of theeukaryotic cell. The target polynucleotide can be a sequence coding agene product (e.g., a protein) or a non-coding sequence (e.g., aregulatory polynucleotide or a junk DNA). Examples of targetpolynucleotides include a sequence associated with a signalingbiochemical pathway, e.g., a signaling biochemical pathway-associatedgene or polynucleotide. Examples of target polynucleotides include adisease associated gene or polynucleotide. A “disease-associated” geneor polynucleotide refers to any gene or polynucleotide which is yieldingtranscription or translation products at an abnormal level or in anabnormal form in cells derived from a disease-affected tissues comparedwith tissues or cells of a non disease control. It may be a gene thatbecomes expressed at an abnormally high level; it may be a gene thatbecomes expressed at an abnormally low level, where the alteredexpression correlates with the occurrence and/or progression of thedisease. A disease-associated gene also refers to a gene possessingmutation(s) or genetic variation that is directly responsible or is inlinkage disequilibrium with a gene(s) that is responsible for theetiology of a disease. The transcribed or translated products may beknown or unknown, and may be at a normal or abnormal level. The targetpolynucleotide of a CRISPR complex can be any polynucleotide endogenousor exogenous to the eukaryotic cell. For example, the targetpolynucleotide can be a polynucleotide residing in the nucleus of theeukaryotic cell. The target polynucleotide can be a sequence coding agene product (e.g., a protein) or a non-coding sequence (e.g., aregulatory polynucleotide or a junk DNA). Without wishing to be bound bytheory, it is believed that the target sequence should be associatedwith a PAM (protospacer adjacent motif); that is, a short sequencerecognized by the CRISPR complex. The precise sequence and lengthrequirements for the PAM differ depending on the CRISPR enzyme used, butPAMs are typically 2-5 base pair sequences adjacent the protospacer(that is, the target sequence) Examples of PAM sequences are given inthe examples section below, and the skilled person will be able toidentify further PAM sequences for use with a given CRISPR enzyme. Insome embodiments, the method comprises allowing a CRISPR complex to bindto the target polynucleotide to effect cleavage of said targetpolynucleotide thereby modifying the target polynucleotide, wherein theCRISPR complex comprises a CRISPR enzyme complexed with a guide sequencehybridized to a target sequence within said target polynucleotide,wherein said guide sequence is linked to a tracr mate sequence which inturn hybridizes to a tracr sequence. In one aspect, the inventionprovides a method of modifying expression of a polynucleotide in aeukaryotic cell. In some embodiments, the method comprises allowing aCRISPR complex to bind to the polynucleotide such that said bindingresults in increased or decreased expression of said polynucleotide;wherein the CRISPR complex comprises a CRISPR enzyme complexed with aguide sequence hybridized to a target sequence within saidpolynucleotide, wherein said guide sequence is linked to a tracr matesequence which in turn hybridizes to a tracr sequence. Similarconsiderations and conditions apply as above for methods of modifying atarget polynucleotide. In fact, these sampling, culturing andre-introduction options apply across the aspects of the presentinvention. In one aspect, the invention provides for methods ofmodifying a target polynucleotide in a eukaryotic cell, which may be invivo, ex vivo or in vitro. In some embodiments, the method comprisessampling a cell or population of cells from a human or non-human animal,and modifying the cell or cells. Culturing may occur at any stage exvivo. The cell or cells may even be re-introduced into the non-humananimal or plant. For re-introduced cells it is particularly preferredthat the cells are stem cells.

Indeed, in any aspect of the invention, the CRISPR complex may comprisea CRISPR enzyme complexed with a guide sequence hybridized to a targetsequence, wherein said guide sequence may be linked to a tracr matesequence which in turn may hybridize to a tracr sequence.

The invention relates to the engineering and optimization of systems,methods and compositions used for the control of gene expressioninvolving sequence targeting, such as genome perturbation orgene-editing, that relate to the CRISPR-Cas system and componentsthereof. In advantageous embodiments, the Cas enzyme is Cas9. Anadvantage of the present methods is that the CRISPR system minimizes oravoids off-target binding and its resulting side effects. This isachieved using systems arranged to have a high degree of sequencespecificity for the target DNA.

Crystallization of CRISPR-Cas9 and Crystal Structure

Crystallization of CRISPR-cas9 and Characterization of CrystalStructure: The crystals of the invention can be obtained by techniquesof protein crystallography, including batch, liquid bridge, dialysis,vapor diffusion and hanging drop methods. Generally, the crystals of theinvention are grown by dissolving substantially pure CRISPR-cas9 and anucleic acid molecule to which it binds in an aqueous buffer containinga precipitant at a concentration just below that necessary toprecipitate. Water is removed by controlled evaporation to produceprecipitating conditions, which are maintained until crystal growthceases.

Uses of the Crystals, Crystal Structure and Atomic StructureCo-Ordinates: The crystals of the invention, and particularly the atomicstructure co-ordinates obtained therefrom, have a wide variety of uses.The crystals and structure co-ordinates are particularly useful foridentifying compounds (nucleic acid molecules) that bind to CRISPR-cas9,and CRISPR-cas9s that can bind to particular compounds (nucleic acidmolecules). Thus, the structure co-ordinates described herein can beused as phasing models in determining the crystal structures ofadditional synthetic or mutated CRISPR-cas9s, cas9s, nickases, bindingdomains. The provision of the crystal structure of CRISPR-cas9 complexedwith a nucleic acid molecule as in the herein Crystal Structure Tableand the Figures provide the skilled artisan with a detailed insight intothe mechanisms of action of CRISPR-cas9. This insight provides a meansto design modified CRISPR-cas9s, such as by attaching thereto afunctional group, such as a repressor or activator. While one can attacha functional group such as a repressor or activator to the N or Cterminal of CRISPR-cas9, the crystal structure demonstrates that the Nterminal seems obscured or hidden, whereas the C terminal is moreavailable for a functional group such as repressor or activator.Moreover, the crystal structure demonstrates that there is a flexibleloop between approximately CRISPR-cas9 (S. pyogenes) residues 534-676which is suitable for attachment of a functional group such as anactivator or repressor. Attachment can be via a linker, e.g., a flexibleglycine-serine (GlyGlyGlySer) (SEQ ID NO: 1) or (GGGS)3 (SEQ ID NO: 28)or a rigid alpha-helical linker such as (Ala(GluAlaAlaAlaLys)Ala) (SEQID NO: 29). In addition to the flexible loop there is also a nuclease orH3 region, an H2 region and a helical region. By “helix” or “helical”,is meant a helix as known in the art, including, but not limited to analpha-helix. Additionally, the term helix or helical may also be used toindicate a c-terminal helical element with an N-terminal turn.

The provision of the crystal structure of CRISPR-cas9 complexed with anucleic acid molecule allows a novel approach for drug or compounddiscovery, identification, and design for compounds that can bind toCRISPR-cas9 and thus the invention provides tools useful in diagnosis,treatment, or prevention of conditions or diseases of multicellularorganisms, e.g., algae, plants, invertebrates, fish, amphibians,reptiles, avians, mammals; for example domesticated plants, animals(e.g., production animals such as swine, bovine, chicken; companionanimal such as felines, canines, rodents (rabbit, gerbil, hamster);laboratory animals such as mouse, rat), and humans. Accordingly, theinvention provides a computer-based method of rational design ofCRISPR-cas9 complexes. This rational design can comprise: providing thestructure of the CRISPR-cas9 complex as defined by some or all (e.g., atleast 2 or more, e.g., at least 5, advantageously at least 10, moreadvantageously at least 50 and even more advantageously at least 100atoms of the structure) co-ordinates in the herein Crystal StructureTable and/or in Figure(s); providing a structure of a desired nucleicacid molecule as to which a CRISPR-cas9 complex is desired; and fittingthe structure of the CRISPR-cas9 complex as defined by some or allco-ordinates in the herein Crystal Structure Table and/or in Figures tothe desired nucleic acid molecule, including in said fitting obtainingputative modification(s) of the CRISPR-cas9 complex as defined by someor all co-ordinates in the herein Crystal Structure Table and/or inFigures for said desired nucleic acid molecule to bind for CRISPR-cas9complex(es) involving the desired nucleic acid molecule. The method orfitting of the method may use the co-ordinates of atoms of interest ofthe CRISPR-cas9 complex as defined by some or all co-ordinates in theherein Crystal Structure Table and/or in Figures which are in thevicinity of the active site or binding region (e.g., at least 2 or more,e.g., at least 5, advantageously at least 10, more advantageously atleast 50 and even more advantageously at least 100 atoms of thestructure) in order to model the vicinity of the active site or bindingregion. These co-ordinates may be used to define a space which is thenscreened “in silico” against a desired or candidate nucleic acidmolecule. Thus, the invention provides a computer-based method ofrational design of CRISPR-cas9 complexes. This method may include:providing the co-ordinates of at least two atoms of the herein CrystalStructure Table (“selected co-ordinates”); providing the structure of acandidate or desired nucleic acid molecule; and fitting the structure ofthe candidate to the selected co-ordinates. In this fashion, the skilledperson may also fit a functional group and a candidate or desirednucleic acid molecule. For example, providing the structure of theCRISPR-cas9 complex as defined by some or all (e.g., at least 2 or more,e.g., at least 5, advantageously at least 10, more advantageously atleast 50 and even more advantageously at least 100 atoms of thestructure) co-ordinates in the herein Crystal Structure Table and/or inFigure(s); providing a structure of a desired nucleic acid molecule asto which a CRISPR-cas9 complex is desired; fitting the structure of theCRISPR-cas9 complex as defined by some or all co-ordinates in the hereinCrystal Structure Table and/or in Figures to the desired nucleic acidmolecule, including in said fitting obtaining putative modification(s)of the CRISPR-cas9 complex as defined by some or all co-ordinates in theherein Crystal Structure Table and/or in Figures for said desirednucleic acid molecule to bind for CRISPR-cas9 complex(es) involving thedesired nucleic acid molecule; selecting putative fitCRISPR-cas9-desired nucleic acid molecule complex(es), fitting suchputative fit CRISPR-cas9-desired nucleic acid molecule complex(es) tothe functional group (e.g., activator, repressor), e.g., as to locationsfor situating the functional group (e.g., positions within the flexibleloop) and/or putative modifications of the putative fitCRISPR-cas9-desired nucleic acid molecule complex(es) for creatinglocations for situating the functional group. As alluded to, theinvention can be practiced using co-ordinates in the herein CrystalStructure Table and/or in Figures which are in the vicinity of theactive site or binding region; and therefore, the methods of theinvention can employ a sub-domain of interest of the CRISPR-cas9complex. Methods of the invention can be practiced using coordinates ofa domain or sub-domain. The methods can optionally include synthesizingthe candidate or desired nucleic acid molecule and/or the CRISPR-cas9systems from the “in silico” output and testing binding and/or activityof “wet” or actual a functional group linked to a “wet” or actualCRISPR-cas9 system bound to a “wet” or actual candidate or desirednucleic acid molecule. The methods can include synthesizing theCRISPR-cas9 systems (including a functional group) from the “in silico”output and testing binding and/or activity of “wet” or actual afunctional group linked to a “wet” or actual CRISPR-cas9 system bound toan in vivo “wet” or actual candidate or desired nucleic acid molecule,e.g., contacting “wet” or actual CRISPR-cas9 system including afunctional group from the “in silico” output with a cell containing thedesired or candidate nucleic acid molecule. These methods can includeobserving the cell or an organism containing the cell for a desiredreaction, e.g., reduction of symptoms or condition or disease. The stepof providing the structure of a candidate nucleic acid molecule mayinvolve selecting the compound by computationally screening a databasecontaining nucleic acid molecule data, e.g., such data as to conditionsor diseases. A 3-D descriptor for binding of the candidate nucleic acidmolecule may be derived from geometric and functional constraintsderived from the architecture and chemical nature of the CRISPR-cas9complex or domains or regions thereof from the herein crystal structure.In effect, the descriptor can be a type of virtual modification(s) ofthe CRISPR-cas9 complex crystal structure herein for binding CRISPR-cas9to the candidate or desired nucleic acid molecule. The descriptor maythen be used to interrogate the nucleic acid molecule database toascertain those nucleic acid molecules of the database that haveputatively good binding to the descriptor. The herein “wet” steps canthen be performed using the descriptor and nucleic acid molecules thathave putatively good binding.

“Fitting” can mean determining, by automatic or semi-automatic means,interactions between at least one atom of the candidate and at least oneatom of the CRISPR-cas9 complex and calculating the extent to which suchan interaction is stable. Interactions can include attraction,repulsion, brought about by charge, steric considerations, and the like.A “sub-domain” can mean at least one, e.g., one, two, three, or four,complete element(s) of secondary structure. Particular regions ordomains of the CRISPR-cas9 include those identified in the hereinCrystal Structure Table and the Figures.

In any event, the determination of the three-dimensional structure ofCRISPR-cas 9 (S. pyogenes cas9) complex provides a basis for the designof new and specific nucleic acid molecules that bind to CRISPR-cas 9(e.g., S. pyogenes cas9), as well as the design of new CRISPR-cas9systems, such as by way of modification of the CRISPR-cas9 system tobind to various nucleic acid molecules, by way of modification of theCRISPR-cas9 system to have linked thereto to any one or more of variousfunctional groups that may interact with each other, with theCRISPR-cas9 (e.g., an inducible system that provides for self-activationand/or self-termination of function), with the nucleic acid moleculenucleic acid molecules (e.g., the functional group may be a regulatoryor functional domain which may be selected from the group consisting ofa transcriptional repressor, a transcriptional activator, a nucleasedomain, a DNA methyl transferase, a protein acetyltransferase, a proteindeacetylase, a protein methyltransferase, a protein deaminase, a proteinkinase, and a protein phosphatase; and, in some aspects, the functionaldomain is an epigenetic regulator; see, e.g., Zhang et al., U.S. Pat.No. 8,507,272, and it is again mentioned that it and all documents citedherein and all appln cited documents are hereby incorporated herein byreference), by way of modification of cas9, by way of novel nickases).Indeed, the herewith CRISPR-cas9 (S. pyogenes cas9) crystal structurehas a multitude of uses. For example, from knowing the three-dimensionalstructure of CRISPR-cas9 (S. pyogenes cas9) crystal structure, computermodelling programs may be used to design or identify different moleculesexpected to interact with possible or confirmed sites such as bindingsites or other structural or functional features of the CRISPR-cas9system (e.g., S. pyogenes cas9). Compound that potentially bind(“binder”) can be examined through the use of computer modeling using adocking program. Docking programs are known; for example GRAM, DOCK orAUTODOCK (see Walters et al. Drug Discovery Today, vol. 3, no. 4 (1998),160-178, and Dunbrack et al. Folding and Design 2 (1997), 27-42). Thisprocedure can include computer fitting of potential binders ascertainhow well the shape and the chemical structure of the potential binderwill bind to a CRISPR-cas9 system (e.g., S. pyogenes cas9).Computer-assisted, manual examination of the active site or binding siteof a CRISPR-cas9 system (e.g., S. pyogenes cas9) may be performed.Programs such as GRID (P. Goodford, J. Med. Chem, 1985, 28, 849-57)—aprogram that determines probable interaction sites between moleculeswith various functional groups—may also be used to analyze the activesite or binding site to predict partial structures of binding compounds.Computer programs can be employed to estimate the attraction, repulsionor steric hindrance of the two binding partners, e.g., CRISPR-cas9system (e.g., S. pyogenes cas9) and a candidate nucleic acid molecule ora nucleic acid molecule and a candidate CRISPR-cas9 system (e.g., S.pyogenes cas9); and the CRISPR-cas9 crystral structure (S. pyogenescas9) herewith enables such methods. Generally, the tighter the fit, thefewer the steric hindrances, and the greater the attractive forces, themore potent the potential binder, since these properties are consistentwith a tighter binding constant. Furthermore, the more specificity inthe design of a candidate CRISPR-cas9 system (e.g., S. pyogenes cas9),the more likely it is that it will not interact with off-targetmolecules as well. Also, “wet” methods are enabled by the instantinvention. For example, in an aspect, the invention provides for amethod for determining the structure of a binder (e.g., target nucleicacid molecule) of a candidate CRISPR-cas9 system (e.g., S. pyogenescas9) bound to the candidate CRISPR-cas9 system (e.g., S. pyogenescas9), said method comprising, (a) providing a first crystal of acandidate CRISPR-cas9 system (S. pyogenes cas9) according to theinvention or a second crystal of a candidate a candidate CRISPR-cas9system (e.g., S. pyogenes cas9), (b) contacting the first crystal orsecond crystal with said binder under conditions whereby a complex mayform; and (c) determining the structure of said a candidate (e.g.,CRISPR-cas9 system (e.g., S. pyogenes cas9) or CRISPR-cas9 system (S.pyogenes cas9) complex. The second crystal may have essentially the samecoordinates discussed herein, however due to minor alterations inCRISPR-cas9 system (e.g., from the cas9 of such a system being e.g., S.pyogenes cas9 versus being S. pyogenes cas9), wherein “e.g., S. pyogenescas9” indicates that the cas9 is a cas9 and can be of or derived from S.pyogenes or an ortholog thereof), the crystal may form in a differentspace group.

The invention further involves, in place of or in addition to “insilico” methods, other “wet” methods, including high throughputscreening of a binder (e.g., target nucleic acid molecule) and acandidate CRISPR-cas9 system (e.g., S. pyogenes cas9), or a candidatebinder (e.g., target nucleic acid molecule) and a CRISPR-cas9 system(e.g., S. pyogenes cas9), or a candidate binder (e.g., target nucleicacid molecule) and a candidate CRISPR-cas9 system (e.g., S. pyogenescas9) (the foregoing CRISPR-cas9 system(s) with or without one or morefunctional group(s)), to select compounds with binding activity. Thosepairs of binder and CRISPR-cas9 system which show binding activity maybe selected and further crystallized with the CRISPR-cas9 crystal havinga structure herein, e.g., by co-crystallization or by soaking, for X-rayanalysis. The resulting X-ray structure may be compared with that of theherein Crystal Structure Table and the information in the Figures for avariety of purposes, e.g., for areas of overlap. Having designed,identified, or selected possible pairs of binder and CRISPR-cas9 systemby determining those which have favorable fitting properties, e.g.,predicted strong attraction based on the pairs of binder and CRISPR-cas9crystral structure data herein, these possible pairs can then bescreened by “wet” methods for activity. Consequently, in an aspect theinvention can involve: obtaining or synthesizing the possible pairs; andcontacting a binder (e.g., target nucleic acid molecule) and a candidateCRISPR-cas9 system (e.g., S. pyogenes cas9), or a candidate binder(e.g., target nucleic acid molecule) and a CRISPR-cas9 system (e.g., S.pyogenes cas9), or a candidate binder (e.g., target nucleic acidmolecule) and a candidate CRISPR-cas9 system (e.g., S. pyogenes cas9)(the foregoing CRISPR-cas9 system(s) with or without one or morefunctional group(s)) to determine ability to bind. In the latter step,the contacting is advantageously under conditions to determine function.Instead of, or in addition to, performing such an assay, the inventionmay comprise: obtaining or synthesizing complex(es) from said contactingand analyzing the complex(es), e.g., by X-ray diffraction or NMR orother means, to determine the ability to bind or interact. Detailedstructural information can then be obtained about the binding, and inlight of this information, adjustments can be made to the structure orfunctionality of a candidate CRISPR-cas9 system or components thereof.These steps may be repeated and re-repeated as necessary. Alternativelyor additionally, potential CRISPR-cas9 systems from or in the foregoingmethods can be with nucleic acid molecules in vivo, including withoutlimitation by way of administration to an organism (including non-humananimal and human) to ascertain or confirm function, including whether adesired outcome (e.g., reduction of symptoms, treatment) resultstherefrom.

The invention further involves a method of determining three dimensionalstructures of CRISPR-cas systems or complex(es) of unknown structure byusing the structural co-ordinates of the herein Crystal Structure Tableand the information in the Figures. For example, if X-raycrystallographic or NMR spectroscopic data are provided for a CRISPR-cassystem or complex of unknown crystal structure, the structure of aCRISPR-cas9 complex as defined in the herein Crystal Structure Table andthe Figures may be used to interpret that data to provide a likelystructure for the unknown system or complex by such techniques as byphase modeling in the case of X-ray crystallography. Thus, an inventivemethod can comprise: aligning a representation of the CRISPR-cas systemor complex having an unknown crystral structure with an analogousrepresentation of the CRISPR-cas(9) system and complex of the crystalstructure herein to match homologous or analogous regions (e.g.,homologous or analogous sequences); modeling the structure of thematched homologous or analogous regions (e.g., sequences) of theCRISPR-cas system or complex of unknown crystal structure based on thestructure as defined in the herein Crystal Structure Table and/or in theFigures of the corresponding regions (e.g., sequences); and, determininga conformation (e.g. taking into consideration favorable interactionsshould be formed so that a low energy conformation is formed) for theunknown crystal structure which substantially preserves the structure ofsaid matched homologous regions. “Homologous regions” describes, forexample as to amino acids, amino acid residues in two sequences that areidentical or have similar, e.g., aliphatic, aromatic, polar, negativelycharged, or positively charged, side-chain chemical groups. Homologousregions as of nucleic acid molecules can include at least 85% or 86% or87% or 88% or 89% or 90% or 91% or 92% or 93% or 94% or 95% or 96% or97% or 98% or 99% homology or identity. Identical and similar regionsare sometimes described as being respectively “invariant” and“conserved” by those skilled in the art. Advantageously, the first andthird steps are performed by computer modeling. Homology modeling is atechnique that is well known to those skilled in the art (see, e.g.,Greer, Science vol. 228 (1985) 1055, and Blundell et al. Eur J Biochemvol 172 (1988), 513). The computer representation of the conservedregions of the CRISPR-cas9 crystral structure herein and those of aCRISPR-cas system of unknown crystral structure aid in the predictionand determination of the crystral structure of the CRISPR-cas system ofunknown crystal structure. Further still, the aspects of the inventionwhich employ the CRISPR-cas9 crystral structure in silico may be equallyapplied to new CRISPR-cas crystral structures divined by using theherein CRISPR-cas9 crystral structure. In this fashion, a library ofCRISPR-cas crystral structures can be obtained. Rational CRISPR-cassystem design is thus provided by the instant invention. For instance,having determined a conformation or crystral structure of a CRISPR-cassystem or complex, by the methods described herein, such a conformationmay be used in a computer-based methods herein for determining theconformation or crystal structure of other CRISPR-cas systems orcomplexes whose crystral structures are yet unkown. Data from all ofthese crystal structures can be in a database, and the herein methodscan be more robust by having herein comparisons involving the hereincrystral structure or portions thereof be with respect to one or morecrystal structures in the library. The invention further providessystems, such as computer systems, intended to generate structuresand/or perform rational design of a CRISPR-cas system or complex. Thesystem can contain: atomic co-ordinate data according to the hereinCrystal Structure Table and the Figures or be derived therefrom e.g., bymodeling, said data defining the three-dimensional structure of aCRISPR-cas system or complex or at least one domain or sub-domainthereof, or structure factor data therefor, said structure factor databeing derivable from the atomic co-ordinate data of the herein CrystalStructure Table and the Figures. The invention also involves computerreadable media with: atomic co-ordinate data according to the hereinCrystal Structure Table and/or the Figures or derived therefrom e.g., byhomology modeling, said data defining the three-dimensional structure ofa CRISPR-cas system or complex or at least one domain or sub-domainthereof, or structure factor data therefor, said structure factor databeing derivable from the atomic co-ordinate data of the herein CrystalStructure Table and/or the Figures. “Computer readable media” refers toany media which can be read and accessed directly by a computer, andincludes, but is not limited to: magnetic storage media; optical storagemedia; electrical storage media; cloud storage and hybrids of thesecategories. By providing such computer readable media, the atomicco-ordinate data can be routinely accessed for modeling or other “insilica” methods. The invention further comprehends methods of doingbusiness by providing access to such computer readable media, forinstance on a subscription basis, via the Internet or a globalcommunication/computer network; or, the computer system can be availableto a user, on a subscription basis. A “computer system” refers to thehardware means, software means and data storage means used to analyzethe atomic co-ordinate data of the present invention. The minimumhardware means of computer-based systems of the invention may comprise acentral processing unit (CPU), input means, output means, and datastorage means. Desirably, a display or monitor is provided to visualizestructure data. The invention further comprehends methods oftransmitting information obtained in any method or step thereofdescribed herein or any information described herein, e.g., viatelecommunications, telephone, mass communications, mass media,presentations, internet, email, etc. The crystal structures of theinvention can be analyzed to generate Fourier electron density map(s) ofCRISPR-cas systems or complexes; advantageously, the three-dimensionalstructure being as defined by the atomic co-ordinate data according tothe herein Crystal Structure Table and/or the Figures. Fourier electrondensity maps can be calculated based on X-ray diffraction patterns.These maps can then be used to determine aspects of binding or otherinteractions. Electron density maps can be calculated using knownprograms such as those from the CCP4 computer package (CollaborativeComputing Project, No. 4. The CCP4 Suite: Programs for ProteinCrystallography, Acta Crystallographica, D50, 1994, 760-763). For mapvisualization and model building programs such as “QUANTA” (1994, SanDiego, Calif.: Molecular Simulations, Jones et al., Acta CrystallographyA47 (1991), 110-119) can be used.

The herein Crystal Structure Table (see Example 1) gives atomicco-ordinate data for a CRISPR-cas9 (S. pyogenes), and lists each atom bya unique number; the chemical element and its position for each aminoacid residue (as determined by electron density maps and antibodysequence comparisons), the amino acid residue in which the element islocated, the chain identifier, the number of the residue, co-ordinates(e.g., X, Y, Z) which define with respect to the crystallographic axesthe atomic position (in angstroms) of the respective atom, the occupancyof the atom in the respective position, “B”, isotropic displacementparameter (in angstroms²) which accounts for movement of the atom aroundits atomic center, and atomic number. See also the text herein and theFigures.

Nucleic Acids, Amino Acids and Proteins, Regulatory Sequences, Vectors,etc

Nucleic acids, amino acids and proteins: The invention uses nucleicacids to bind target DNA sequences. This is advantageous as nucleicacids are much easier and cheaper to produce than proteins, and thespecificity can be varied according to the length of the stretch wherehomology is sought. Complex 3-D positioning of multiple fingers, forexample is not required. The terms “polynucleotide”, “nucleotide”,“nucleotide sequence”, “nucleic acid” and “oligonucleotide” are usedinterchangeably. They refer to a polymeric form of nucleotides of anylength, either deoxyribonucleotides or ribonucleotides, or analogsthereof. Polynucleotides may have any three dimensional structure, andmay perform any function, known or unknown. The following arenon-limiting examples of polynucleotides: coding or non-coding regionsof a gene or gene fragment, loci (locus) defined from linkage analysis,exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, shortinterfering RNA (siRNA), short-hairpin RNA (shRNA), micro-RNA (miRNA),ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides,plasmids, vectors, isolated DNA of any sequence, isolated RNA of anysequence, nucleic acid probes, and primers. The term also encompassesnucleic-acid-like structures with synthetic backbones, see, e.g.,Eckstein, 1991; Baserga et al., 1992; Milligan, 1993; WO 97/03211; WO96/39154; Mata, 1997; Strauss-Soukup, 1997; and Samstag, 1996. Apolynucleotide may comprise one or more modified nucleotides, such asmethylated nucleotides and nucleotide analogs. If present, modificationsto the nucleotide structure may be imparted before or after assembly ofthe polymer. The sequence of nucleotides may be interrupted bynon-nucleotide components. A polynucleotide may be further modifiedafter polymerization, such as by conjugation with a labeling component.As used herein the term “wild type” is a term of the art understood byskilled persons and means the typical form of an organism, strain, geneor characteristic as it occurs in nature as distinguished from mutant orvariant forms. A “wild type” can be a base line. As used herein the term“variant” should be taken to mean the exhibition of qualities that havea pattern that deviates from what occurs in nature. The terms“non-naturally occurring” or “engineered” are used interchangeably andindicate the involvement of the hand of man. The terms, when referringto nucleic acid molecules or polypeptides mean that the nucleic acidmolecule or the polypeptide is at least substantially free from at leastone other component with which they are naturally associated in natureand as found in nature. “Complementarity” refers to the ability of anucleic acid to form hydrogen bond(s) with another nucleic acid sequenceby either traditional Watson-Crick base pairing or other non-traditionaltypes. A percent complementarity indicates the percentage of residues ina nucleic acid molecule which can form hydrogen bonds (e.g.,Watson-Crick base pairing) with a second nucleic acid sequence (e.g., 5,6, 7, 8, 9, 10 out of 10 being 50%, 60%, 70%, 80%, 90%, and 100%complementary). “Perfectly complementary” means that all the contiguousresidues of a nucleic acid sequence will hydrogen bond with the samenumber of contiguous residues in a second nucleic acid sequence.“Substantially complementary” as used herein refers to a degree ofcomplementarity that is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%,97%, 98%, 99%, or 100% over a region of 8, 9, 10, 11, 12, 13, 14, 15,16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, or morenucleotides, or refers to two nucleic acids that hybridize understringent conditions. As used herein, “stringent conditions” forhybridization refer to conditions under which a nucleic acid havingcomplementarity to a target sequence predominantly hybridizes with thetarget sequence, and substantially does not hybridize to non-targetsequences. Stringent conditions are generally sequence-dependent, andvary depending on a number of factors. In general, the longer thesequence, the higher the temperature at which the sequence specificallyhybridizes to its target sequence. Non-limiting examples of stringentconditions are described in detail in Tijssen (1993), LaboratoryTechniques In Biochemistry And Molecular Biology-Hybridization WithNucleic Acid Probes Part I, Second Chapter “Overview of principles ofhybridization and the strategy of nucleic acid probe assay”, Elsevier,N.Y. Where reference is made to a polynucleotide sequence, thencomplementary or partially complementary sequences are also envisaged.These are preferably capable of hybridising to the reference sequenceunder highly stringent conditions. Generally, in order to maximize thehybridization rate, relatively low-stringency hybridization conditionsare selected: about 20 to 25° C. lower than the thermal melting point(T_(m)). The T_(m) is the temperature at which 50% of specific targetsequence hybridizes to a perfectly complementary probe in solution at adefined ionic strength and pH. Generally, in order to require at leastabout 85% nucleotide complementarity of hybridized sequences, highlystringent washing conditions are selected to be about 5 to 15° C. lowerthan the T_(m). In order to require at least about 70% nucleotidecomplementarity of hybridized sequences, moderately-stringent washingconditions are selected to be about 15 to 30° C. lower than the T_(m).Highly permissive (very low stringency) washing conditions may be as lowas 50° C. below the T_(m), allowing a high level of mis-matching betweenhybridized sequences. Those skilled in the art will recognize that otherphysical and chemical parameters in the hybridization and wash stagescan also be altered to affect the outcome of a detectable hybridizationsignal from a specific level of homology between target and probesequences. Preferred highly stringent conditions comprise incubation in50% formamide, 5×SSC, and 1% SDS at 42° C., or incubation in 5×SSC and1% SDS at 65° C., with wash in 0.2×SSC and 0.1% SDS at 65° C.“Hybridization” refers to a reaction in which one or morepolynucleotides react to form a complex that is stabilized via hydrogenbonding between the bases of the nucleotide residues. The hydrogenbonding may occur by Watson Crick base pairing, Hoogstein binding, or inany other sequence specific manner. The complex may comprise two strandsforming a duplex structure, three or more strands forming a multistranded complex, a single self-hybridizing strand, or any combinationof these. A hybridization reaction may constitute a step in a moreextensive process, such as the initiation of PCR, or the cleavage of apolynucleotide by an enzyme. A sequence capable of hybridizing with agiven sequence is referred to as the “complement” of the given sequence.As used herein, the term “genomic locus” or “locus” (plural loci) is thespecific location of a gene or DNA sequence on a chromosome. A “gene”refers to stretches of DNA or RNA that encode a polypeptide or an RNAchain that has functional role to play in an organism and hence is themolecular unit of heredity in living organisms. For the purpose of thisinvention it may be considered that genes include regions which regulatethe production of the gene product, whether or not such regulatorysequences are adjacent to coding and/or transcribed sequences.Accordingly, a gene includes, but is not necessarily limited to,promoter sequences, terminators, translational regulatory sequences suchas ribosome binding sites and internal ribosome entry sites, enhancers,silencers, insulators, boundary elements, replication origins, matrixattachment sites and locus control regions. As used herein, “expressionof a genomic locus” or “gene expression” is the process by whichinformation from a gene is used in the synthesis of a functional geneproduct. The products of gene expression are often proteins, but innon-protein coding genes such as rRNA genes or tRNA genes, the productis functional RNA. The process of gene expression is used by all knownlife—eukaryotes (including multicellular organisms), prokaryotes(bacteria and archaea) and viruses to generate functional products tosurvive. As used herein “expression” of a gene or nucleic acidencompasses not only cellular gene expression, but also thetranscription and translation of nucleic acid(s) in cloning systems andin any other context. As used herein, “expression” also refers to theprocess by which a polynucleotide is transcribed from a DNA template(such as into and mRNA or other RNA transcript) and/or the process bywhich a transcribed mRNA is subsequently translated into peptides,polypeptides, or proteins. Transcripts and encoded polypeptides may becollectively referred to as “gene product.” If the polynucleotide isderived from genomic DNA, expression may include splicing of the mRNA ina eukaryotic cell. The terms “polypeptide”, “peptide” and “protein” areused interchangeably herein to refer to polymers of amino acids of anylength. The polymer may be linear or branched, it may comprise modifiedamino acids, and it may be interrupted by non amino acids. The termsalso encompass an amino acid polymer that has been modified; forexample, disulfide bond formation, glycosylation, lipidation,acetylation, phosphorylation, or any other manipulation, such asconjugation with a labeling component. As used herein the term “aminoacid” includes natural and/or unnatural or synthetic amino acids,including glycine and both the D or L optical isomers, and amino acidanalogs and peptidomimetics. As used herein, the term “domain” or“protein domain” refers to a part of a protein sequence that may existand function independently of the rest of the protein chain. Asdescribed in aspects of the invention, sequence identity is related tosequence homology. Homology comparisons may be conducted by eye, or moreusually, with the aid of readily available sequence comparison programs.These commercially available computer programs may calculate percent (%)homology between two or more sequences and may also calculate thesequence identity shared by two or more amino acid or nucleic acidsequences. In some preferred embodiments, the capping region of thedTALEs described herein have sequences that are at least 95% identicalor share identity to the capping region amino acid sequences providedherein. Sequence homologies may be generated by any of a number ofcomputer programs known in the art, for example BLAST or FASTA, etc. Asuitable computer program for carrying out such an alignment is the GCGWisconsin Bestfit package (University of Wisconsin, U.S.A; Devereux etal., 1984, Nucleic Acids Research 12:387). Examples of other softwarethan may perform sequence comparisons include, but are not limited to,the BLAST package (see Ausubel et al., 1999 ibid—Chapter 18), FASTA(Atschul et al., 1990, J. Mol. Biol., 403-410) and the GENEWORKS suiteof comparison tools. Both BLAST and FASTA are available for offline andonline searching (see Ausubel et al., 1999 ibid, pages 7-58 to 7-60).However it is preferred to use the GCG Bestfit program. Percentage (%)sequence homology may be calculated over contiguous sequences, i.e., onesequence is aligned with the other sequence and each amino acid ornucleotide in one sequence is directly compared with the correspondingamino acid or nucleotide in the other sequence, one residue at a time.This is called an “ungapped” alignment. Typically, such ungappedalignments are performed only over a relatively short number ofresidues. Although this is a very simple and consistent method, it failsto take into consideration that, for example, in an otherwise identicalpair of sequences, one insertion or deletion may cause the followingamino acid residues to be put out of alignment, thus potentiallyresulting in a large reduction in % homology when a global alignment isperformed. Consequently, most sequence comparison methods are designedto produce optimal alignments that take into consideration possibleinsertions and deletions without unduly penalizing the overall homologyor identity score. This is achieved by inserting “gaps” in the sequencealignment to try to maximize local homology or identity. However, thesemore complex methods assign “gap penalties” to each gap that occurs inthe alignment so that, for the same number of identical amino acids, asequence alignment with as few gaps as possible—reflecting higherrelatedness between the two compared sequences—may achieve a higherscore than one with many gaps. “Affinity gap costs” are typically usedthat charge a relatively high cost for the existence of a gap and asmaller penalty for each subsequent residue in the gap. This is the mostcommonly used gap scoring system. High gap penalties may, of course,produce optimized alignments with fewer gaps. Most alignment programsallow the gap penalties to be modified. However, it is preferred to usethe default values when using such software for sequence comparisons.For example, when using the GCG Wisconsin Bestfit package the defaultgap penalty for amino acid sequences is −12 for a gap and −4 for eachextension. Calculation of maximum % homology therefore first requiresthe production of an optimal alignment, taking into consideration gappenalties. A suitable computer program for carrying out such analignment is the GCG Wisconsin Bestfit package (Devereux et al., 1984Nuc. Acids Research 12 p387). Examples of other software than mayperform sequence comparisons include, but are not limited to, the BLASTpackage (see Ausubel et al., 1999 Short Protocols in Molecular Biology,4^(th) Ed.—Chapter 18), FASTA (Altschul et al., 1990 J. Mol. Biol.403-410) and the GENEWORKS suite of comparison tools. Both BLAST andFASTA are available for offline and online searching (see Ausubel etal., 1999, Short Protocols in Molecular Biology, pages 7-58 to 7-60).However, for some applications, it is preferred to use the GCG Bestfitprogram. A new tool, called BLAST 2 Sequences is also available forcomparing protein and nucleotide sequences (see FEMS Microbiol Lett.1999 174(2): 247-50; FEMS Microbiol Lett. 1999 177(1): 187-8 and thewebsite of the National Center for Biotechnology information at thewebsite of the National Institutes for Health). Although the final %homology may be measured in terms of identity, the alignment processitself is typically not based on an all-or-nothing pair comparison.Instead, a scaled similarity score matrix is generally used that assignsscores to each pair-wise comparison based on chemical similarity orevolutionary distance. An example of such a matrix commonly used is theBLOSUM62 matrix—the default matrix for the BLAST suite of programs. GCGWisconsin programs generally use either the public default values or acustom symbol comparison table, if supplied (see user manual for furtherdetails). For some applications, it is preferred to use the publicdefault values for the GCG package, or in the case of other software,the default matrix, such as BLOSUM62. Alternatively, percentagehomologies may be calculated using the multiple alignment feature inDNASIS™ (Hitachi Software), based on an algorithm, analogous to CLUSTAL(Higgins D G & Sharp P M (1988), Gene 73(1), 237-244). Once the softwarehas produced an optimal alignment, it is possible to calculate %homology, preferably % sequence identity. The software typically doesthis as part of the sequence comparison and generates a numericalresult. The sequences may also have deletions, insertions orsubstitutions of amino acid residues which produce a silent change andresult in a functionally equivalent substance. Deliberate amino acidsubstitutions may be made on the basis of similarity in amino acidproperties (such as polarity, charge, solubility, hydrophobicity,hydrophilicity, and/or the amphipathic nature of the residues) and it istherefore useful to group amino acids together in functional groups.Amino acids may be grouped together based on the properties of theirside chains alone. However, it is more useful to include mutation dataas well. The sets of amino acids thus derived are likely to be conservedfor structural reasons. These sets may be described in the form of aVenn diagram (Livingstone C. D. and Barton G. J. (1993) “Proteinsequence alignments: a strategy for the hierarchical analysis of residueconservation” Comput. Appl. Biosci. 9: 745-756) (Taylor W. R. (1986)“The classification of amino acid conservation” J. Theor. Biol. 119;205-218). Conservative substitutions may be made, for example accordingto the table below which describes a generally accepted Venn diagramgrouping of amino acids.

Set Sub-set Hydrophobic F W Y H K M I L V A G C Aromatic F W Y HAliphatic I L V Polar W Y H K R E D C S T N Q Charged H K R E DPositively charged H K R Negatively charged E D Small V CA G S P T N DTiny A G S

Embodiments of the invention include sequences (both polynucleotide orpolypeptide) which may comprise homologous substitution (substitutionand replacement are both used herein to mean the interchange of anexisting amino acid residue or nucleotide, with an alternative residueor nucleotide) that may occur i.e., like-for-like substitution in thecase of amino acids such as basic for basic, acidic for acidic, polarfor polar, etc. Non-homologous substitution may also occur i.e., fromone class of residue to another or alternatively involving the inclusionof unnatural amino acids such as ornithine (hereinafter referred to asZ), diaminobutyric acid ornithine (hereinafter referred to as B),norleucine ornithine (hereinafter referred to as O), pyriylalanine,thienylalanine, naphthylalanine and phenylglycine. Variant amino acidsequences may include suitable spacer groups that may be insertedbetween any two amino acid residues of the sequence including alkylgroups such as methyl, ethyl or propyl groups in addition to amino acidspacers such as glycine or β-alanine residues. A further form ofvariation, which involves the presence of one or more amino acidresidues in peptoid form, may be well understood by those skilled in theart. For the avoidance of doubt, “the peptoid form” is used to refer tovariant amino acid residues wherein the α-carbon substituent group is onthe residue's nitrogen atom rather than the α-carbon. Processes forpreparing peptides in the peptoid form are known in the art, for exampleSimon R J et al., PNAS (1992) 89(20), 9367-9371 and Horwell D C, TrendsBiotechnol. (1995) 13(4), 132-134.

For purpose of this invention, amplification means any method employinga primer and a polymerase capable of replicating a target sequence withreasonable fidelity. Amplification may be carried out by natural orrecombinant DNA polymerases such as TaqGold™, T7 DNA polymerase, Klenowfragment of E. coli DNA polymerase, and reverse transcriptase. Apreferred amplification method is PCR.

In certain aspects the invention involves vectors. A used herein, a“vector” is a tool that allows or facilitates the transfer of an entityfrom one environment to another. It is a replicon, such as a plasmid,phage, or cosmid, into which another DNA segment may be inserted so asto bring about the replication of the inserted segment. Generally, avector is capable of replication when associated with the proper controlelements. In general, the term “vector” refers to a nucleic acidmolecule capable of transporting another nucleic acid to which it hasbeen linked. Vectors include, but are not limited to, nucleic acidmolecules that are single-stranded, double-stranded, or partiallydouble-stranded; nucleic acid molecules that comprise one or more freeends, no free ends (e.g. circular); nucleic acid molecules that compriseDNA, RNA, or both; and other varieties of polynucleotides known in theart. One type of vector is a “plasmid,” which refers to a circulardouble stranded DNA loop into which additional DNA segments can beinserted, such as by standard molecular cloning techniques. Another typeof vector is a viral vector, wherein virally-derived DNA or RNAsequences are present in the vector for packaging into a virus (e.g.retroviruses, replication defective retroviruses, adenoviruses,replication defective adenoviruses, and adeno-associated viruses(AAVs)). Viral vectors also include polynucleotides carried by a virusfor transfection into a host cell. Certain vectors are capable ofautonomous replication in a host cell into which they are introduced(e.g. bacterial vectors having a bacterial origin of replication andepisomal mammalian vectors). Other vectors (e.g., non-episomal mammalianvectors) are integrated into the genome of a host cell upon introductioninto the host cell, and thereby are replicated along with the hostgenome. Moreover, certain vectors are capable of directing theexpression of genes to which they are operatively-linked. Such vectorsare referred to herein as “expression vectors.” Common expressionvectors of utility in recombinant DNA techniques are often in the formof plasmids.

Recombinant expression vectors can comprise a nucleic acid of theinvention in a form suitable for expression of the nucleic acid in ahost cell, which means that the recombinant expression vectors includeone or more regulatory elements, which may be selected on the basis ofthe host cells to be used for expression, that is operatively-linked tothe nucleic acid sequence to be expressed. Within a recombinantexpression vector, “operably linked” is intended to mean that thenucleotide sequence of interest is linked to the regulatory element(s)in a manner that allows for expression of the nucleotide sequence (e.g.in an in vitro transcription/translation system or in a host cell whenthe vector is introduced into the host cell). With regards torecombination and cloning methods, mention is made of U.S. patentapplication Ser. No. 10/815,730, published Sep. 2, 2004 as US2004-0171156 A1, the contents of which are herein incorporated byreference in their entirety.

Aspects of the invention relate to bicistronic vectors for chimeric RNAand Cas9. Bicistronic expression vectors for chimeric RNA and Cas9 arepreferred. In general and particularly in this embodiment Cas9 ispreferably driven by the CBh promoter. The chimeric RNA may preferablybe driven by a Pol III promoter, such as a U6 promoter. Ideally the twoare combined. The chimeric guide RNA typically consists of a 20bp guidesequence (Ns) and this may be joined to the tracr sequence (running fromthe first “U” of the lower strand to the end of the transcript). Thetracr sequence may be truncated at various positions as indicated. Theguide and tracr sequences are separated by the tracr-mate sequence,which may be GUUUUAGAGCUA (SEQ ID NO: 30). This may be followed by theloop sequence GAAA as shown. Both of these are preferred examples.Applicants have demonstrated Cas9-mediated indels at the human EMX1 andPVALB loci by SURVEYOR assays. ChiRNAs are indicated by their “+n”designation, and crRNA refers to a hybrid RNA where guide and tracrsequences are expressed as separate transcripts. Throughout thisapplication, chimeric RNA may also be called single guide, or syntheticguide RNA (sgRNA). The loop is preferably GAAA, but it is not limited tothis sequence or indeed to being only 4 bp in length. Indeed, preferredloop forming sequences for use in hairpin structures are fournucleotides in length, and most preferably have the sequence GAAA.However, longer or shorter loop sequences may be used, as mayalternative sequences. The sequences preferably include a nucleotidetriplet (for example, AAA), and an additional nucleotide (for example Cor G). Examples of loop forming sequences include CAAA and AAAG. Inpracticing any of the methods disclosed herein, a suitable vector can beintroduced to a cell or an embryo via one or more methods known in theart, including without limitation, microinjection, electroporation,sonoporation, biolistics, calcium phosphate-mediated transfection,cationic transfection, liposome transfection, dendrimer transfection,heat shock transfection, nucleofection transfection, magnetofection,lipofection, impalefection, optical transfection, proprietaryagent-enhanced uptake of nucleic acids, and delivery via liposomes,immunoliposomes, virosomes, or artificial virions. In some methods, thevector is introduced into an embryo by microinjection. The vector orvectors may be microinjected into the nucleus or the cytoplasm of theembryo. In some methods, the vector or vectors may be introduced into acell by nucleofection.

The term “regulatory element” is intended to include promoters,enhancers, internal ribosomal entry sites (IRES), and other expressioncontrol elements (e.g. transcription termination signals, such aspolyadenylation signals and poly-U sequences). Such regulatory elementsare described, for example, in Goeddel, GENE EXPRESSION TECHNOLOGY:METHODS IN ENZYMOLOGY 185, Academic Press, San Diego, Calif. (1990).Regulatory elements include those that direct constitutive expression ofa nucleotide sequence in many types of host cell and those that directexpression of the nucleotide sequence only in certain host cells (e.g.,tissue-specific regulatory sequences). A tissue-specific promoter maydirect expression primarily in a desired tissue of interest, such asmuscle, neuron, bone, skin, blood, specific organs (e.g. liver,pancreas), or particular cell types (e.g. lymphocytes). Regulatoryelements may also direct expression in a temporal-dependent manner, suchas in a cell-cycle dependent or developmental stage-dependent manner,which may or may not also be tissue or cell-type specific. In someembodiments, a vector comprises one or more pol III promoter (e.g. 1, 2,3, 4, 5, or more pol III promoters), one or more pol II promoters (e.g.1, 2, 3, 4, 5, or more pol II promoters), one or more pol I promoters(e.g. 1, 2, 3, 4, 5, or more pol I promoters), or combinations thereof.Examples of pol III promoters include, but are not limited to, U6 and H1promoters. Examples of pol II promoters include, but are not limited to,the retroviral Rous sarcoma virus (RSV) LTR promoter (optionally withthe RSV enhancer), the cytomegalovirus (CMV) promoter (optionally withthe CMV enhancer) [see, e.g., Boshart et al, Cell, 41:521-530 (1985)],the SV40 promoter, the dihydrofolate reductase promoter, the β-actinpromoter, the phosphoglycerol kinase (PGK) promoter, and the EF1αpromoter. Also encompassed by the term “regulatory element” are enhancerelements, such as WPRE; CMV enhancers; the R-U5′ segment in LTR ofHTLV-I (Mol. Cell. Biol., Vol. 8(1), p. 466-472, 1988); SV40 enhancer;and the intron sequence between exons 2 and 3 of rabbit β-globin (Proc.Natl. Acad. Sci. USA., Vol. 78(3), p. 1527-31, 1981). It will beappreciated by those skilled in the art that the design of theexpression vector can depend on such factors as the choice of the hostcell to be transformed, the level of expression desired, etc. A vectorcan be introduced into host cells to thereby produce transcripts,proteins, or peptides, including fusion proteins or peptides, encoded bynucleic acids as described herein (e.g., clustered regularlyinterspersed short palindromic repeats (CRISPR) transcripts, proteins,enzymes, mutant forms thereof, fusion proteins thereof, etc.). Withregards to regulatory sequences, mention is made of U.S. patentapplication Ser. No. 10/491,026, the contents of which are incorporatedby reference herein in their entirety. With regards to promoters,mention is made of PCT publication WO 2011/028929 and U.S. applicationSer. No. 12/511,940, the contents of which are incorporated by referenceherein in their entirety.

Vectors can be designed for expression of CRISPR transcripts (e.g.nucleic acid transcripts, proteins, or enzymes) in prokaryotic oreukaryotic cells. For example, CRISPR transcripts can be expressed inbacterial cells such as Escherichia coli, insect cells (usingbaculovirus expression vectors), yeast cells, or mammalian cells.Suitable host cells are discussed further in Goeddel, GENE EXPRESSIONTECHNOLOGY: METHODS IN ENZYMOLOGY 185, Academic Press, San Diego, Calif.(1990). Alternatively, the recombinant expression vector can betranscribed and translated in vitro, for example using T7 promoterregulatory sequences and T7 polymerase.

Vectors may be introduced and propagated in a prokaryote or prokaryoticcell. In some embodiments, a prokaryote is used to amplify copies of avector to be introduced into a eukaryotic cell or as an intermediatevector in the production of a vector to be introduced into a eukaryoticcell (e.g. amplifying a plasmid as part of a viral vector packagingsystem). In some embodiments, a prokaryote is used to amplify copies ofa vector and express one or more nucleic acids, such as to provide asource of one or more proteins for delivery to a host cell or hostorganism. Expression of proteins in prokaryotes is most often carriedout in Escherichia coli with vectors containing constitutive orinducible promoters directing the expression of either fusion ornon-fusion proteins. Fusion vectors add a number of amino acids to aprotein encoded therein, such as to the amino terminus of therecombinant protein. Such fusion vectors may serve one or more purposes,such as: (i) to increase expression of recombinant protein; (ii) toincrease the solubility of the recombinant protein; and (iii) to aid inthe purification of the recombinant protein by acting as a ligand inaffinity purification. Often, in fusion expression vectors, aproteolytic cleavage site is introduced at the junction of the fusionmoiety and the recombinant protein to enable separation of therecombinant protein from the fusion moiety subsequent to purification ofthe fusion protein. Such enzymes, and their cognate recognitionsequences, include Factor Xa, thrombin and enterokinase. Example fusionexpression vectors include pGEX (Pharmacia Biotech Inc; Smith andJohnson, 1988. Gene 67: 31-40), pMAL (New England Biolabs, Beverly,Mass.) and pRIT5 (Pharmacia, Piscataway, N.J.) that fuse glutathioneS-transferase (GST), maltose E binding protein, or protein A,respectively, to the target recombinant protein. Examples of suitableinducible non-fusion E. coli expression vectors include pTrc (Amrann etal., (1988) Gene 69:301-315) and pET 11d (Studier et al., GENEEXPRESSION TECHNOLOGY: METHODS IN ENZYMOLOGY 185, Academic Press, SanDiego, Calif. (1990) 60-89). In some embodiments, a vector is a yeastexpression vector. Examples of vectors for expression in yeastSaccharomyces cerivisae include pYepSec1 (Baldari, et al., 1987. EMBO J.6: 229-234), pMFa (Kuijan and Herskowitz, 1982. Cell 30: 933-943),pJRY88 (Schultz et al., 1987. Gene 54: 113-123), pYES2 (InvitrogenCorporation, San Diego, Calif.), and picZ (InVitrogen Corp, San Diego,Calif.). In some embodiments, a vector drives protein expression ininsect cells using baculovirus expression vectors. Baculovirus vectorsavailable for expression of proteins in cultured insect cells (e.g., SF9cells) include the pAc series (Smith, et al., 1983. Mol. Cell. Biol. 3:2156-2165) and the pVL series (Lucklow and Summers, 1989. Virology 170:31-39).

In some embodiments, a vector is capable of driving expression of one ormore sequences in mammalian cells using a mammalian expression vector.Examples of mammalian expression vectors include pCDM8 (Seed, 1987.Nature 329: 840) and pMT2PC (Kaufman, et al., 1987. EMBO J. 6: 187-195).When used in mammalian cells, the expression vector's control functionsare typically provided by one or more regulatory elements. For example,commonly used promoters are derived from polyoma, adenovirus 2,cytomegalovirus, simian virus 40, and others disclosed herein and knownin the art. For other suitable expression systems for both prokaryoticand eukaryotic cells see, e.g., Chapters 16 and 17 of Sambrook, et al.,MOLECULAR CLONING: A LABORATORY MANUAL. 2nd ed., Cold Spring HarborLaboratory, Cold Spring Harbor Laboratory Press, Cold Spring Harbor,N.Y., 1989.

In some embodiments, the recombinant mammalian expression vector iscapable of directing expression of the nucleic acid preferentially in aparticular cell type (e.g., tissue-specific regulatory elements are usedto express the nucleic acid). Tissue-specific regulatory elements areknown in the art. Non-limiting examples of suitable tissue-specificpromoters include the albumin promoter (liver-specific; Pinkert, et al.,1987. Genes Dev. 1: 268-277), lymphoid-specific promoters (Calame andEaton, 1988. Adv. Immunol. 43: 235-275), in particular promoters of Tcell receptors (Winoto and Baltimore, 1989. EMBO J. 8: 729-733) andimmunoglobulins (Baneiji, et al., 1983. Cell 33: 729-740; Queen andBaltimore, 1983. Cell 33: 741-748), neuron-specific promoters (e.g., theneurofilament promoter; Byrne and Ruddle, 1989. Proc. Natl. Acad. Sci.USA 86: 5473-5477), pancreas-specific promoters (Edlund, et al., 1985.Science 230: 912-916), and mammary gland-specific promoters (e.g., milkwhey promoter; U.S. Pat. No. 4,873,316 and European ApplicationPublication No. 264,166). Developmentally-regulated promoters are alsoencompassed, e.g., the murine hox promoters (Kessel and Gruss, 1990.Science 249: 374-379) and the α-fetoprotein promoter (Campes andTilghman, 1989. Genes Dev. 3: 537-546). With regards to theseprokaryotic and eukaryotic vectors, mention is made of U.S. Pat. No.6,750,059, the contents of which are incorporated by reference herein intheir entirety. Other embodiments of the invention may relate to the useof viral vectors, with regards to which mention is made of U.S. patentapplication Ser. No. 13/092,085, the contents of which are incorporatedby reference herein in their entirety. Tissue-specific regulatoryelements are known in the art and in this regard, mention is made ofU.S. Pat. No. 7,776,321, the contents of which are incorporated byreference herein in their entirety. In some embodiments, a regulatoryelement is operably linked to one or more elements of a CRISPR system soas to drive expression of the one or more elements of the CRISPR system.In general, CRISPRs (Clustered Regularly Interspaced Short PalindromicRepeats), also known as SPIDRs (SPacer Interspersed Direct Repeats),constitute a family of DNA loci that are usually specific to aparticular bacterial species. The CRISPR locus comprises a distinctclass of interspersed short sequence repeats (SSRs) that were recognizedin E. coli (Ishino et al., J. Bacteriol., 169:5429-5433 [1987]; andNakata et al., J. Bacteriol., 171:3553-3556 [1989]), and associatedgenes. Similar interspersed SSRs have been identified in Haloferaxmediterranei, Streptococcus pyogenes, Anabaena, and Mycobacteriumtuberculosis (See, Groenen et al., Mol. Microbiol., 10:1057-1065 [1993];Hoe et al., Emerg. Infect. Dis., 5:254-263 [1999]; Masepohl et al.,Biochim. Biophys. Acta 1307:26-30 [1996]; and Mojica et al., Mol.Microbiol., 17:85-93 [1995]). The CRISPR loci typically differ fromother SSRs by the structure of the repeats, which have been termed shortregularly spaced repeats (SRSRs) (Janssen et al., OMICS J. Integ. Biol.,6:23-33 [2002]; and Mojica et al., Mol. Microbiol., 36:244-246 [2000]).In general, the repeats are short elements that occur in clusters thatare regularly spaced by unique intervening sequences with asubstantially constant length (Mojica et al., [2000], supra). Althoughthe repeat sequences are highly conserved between strains, the number ofinterspersed repeats and the sequences of the spacer regions typicallydiffer from strain to strain (van Embden et al., J. Bacteriol.,182:2393-2401 [2000]). CRISPR loci have been identified in more than 40prokaryotes (See e.g., Jansen et al., Mol. Microbiol., 43:1565-1575[2002]; and Mojica et al., [2005]) including, but not limited toAeropyrum, Pyrobaculum, Sulfolobus, Archaeoglobus, Halocarcula,Methanobacterium, Methanococcus, Methanosarcina, Methanopyrus,Pyrococcus, Picrophilus, Thermoplasma, Corynebacterium, Mycobacterium,Streptomyces, Aquifex, Porphyromonas, Chlorobium, Thermus, Bacillus,Listeria, Staphylococcus, Clostridium, Thermoanaerobacter, Mycoplasma,Fusobacterium, Azarcus, Chromobacterium, Neisseria, Nitrosomonas,Desulfovibrio, Geobacter, Myxococcus, Campylobacter, Wolinella,Acinetobacter, Erwinia, Escherichia, Legionella, Methylococcus,Pasteurella, Photobacterium, Salmonella, Xanthomonas, Yersinia,Treponema, and Thermotoga.

In some embodiments, the CRISPR enzyme is part of a fusion proteincomprising one or more heterologous protein domains (e.g. about or morethan about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more domains in addition tothe CRISPR enzyme). A CRISPR enzyme fusion protein may comprise anyadditional protein sequence, and optionally a linker sequence betweenany two domains. Examples of protein domains that may be fused to aCRISPR enzyme include, without limitation, epitope tags, reporter genesequences, and protein domains having one or more of the followingactivities: methylase activity, demethylase activity, transcriptionactivation activity, transcription repression activity, transcriptionrelease factor activity, histone modification activity, RNA cleavageactivity and nucleic acid binding activity. Non-limiting examples ofepitope tags include histidine (His) tags, V5 tags, FLAG tags, influenzahemagglutinin (HA) tags, Myc tags, VSV-G tags, and thioredoxin (Trx)tags. Examples of reporter genes include, but are not limited to,glutathione-S-transferase (GST), horseradish peroxidase (HRP),chloramphenicol acetyltransferase (CAT) beta-galactosidase,beta-glucuronidase, luciferase, green fluorescent protein (GFP), HcRed,DsRed, cyan fluorescent protein (CFP), yellow fluorescent protein (YFP),and autofluorescent proteins including blue fluorescent protein (BFP). ACRISPR enzyme may be fused to a gene sequence encoding a protein or afragment of a protein that bind DNA molecules or bind other cellularmolecules, including but not limited to maltose binding protein (MBP),S-tag, Lex A DNA binding domain (DBD) fusions, GAL4 DNA binding domainfusions, and herpes simplex virus (HSV) BP16 protein fusions. Additionaldomains that may form part of a fusion protein comprising a CRISPRenzyme are described in US20110059502, incorporated herein by reference.In some embodiments, a tagged CRISPR enzyme is used to identify thelocation of a target sequence.

In some embodiments, a CRISPR enzyme may form a component of aninducible system. The inducible nature of the system would allow forspatiotemporal control of gene editing or gene expression using a formof energy. The form of energy may include but is not limited toelectromagnetic radiation, sound energy, chemical energy and thermalenergy. Examples of inducible system include tetracycline induciblepromoters (Tet-On or Tet-Off), small molecule two-hybrid transcriptionactivations systems (FKBP, ABA, etc), or light inducible systems(Phytochrome, LOV domains, or cryptochrome). In one embodiment, theCRISPR enzyme may be a part of a Light Inducible TranscriptionalEffector (LITE) to direct changes in transcriptional activity in asequence-specific manner. The components of a light may include a CRISPRenzyme, a light-responsive cytochrome heterodimer (e.g. from Arabidopsisthaliana), and a transcriptional activation/repression domain. Furtherexamples of inducible DNA binding proteins and methods for their use areprovided in U.S. 61/736465 and U.S. 61/721,283, which is herebyincorporated by reference in its entirety.

The practice of the present invention employs, unless otherwiseindicated, conventional techniques of immunology, biochemistry,chemistry, molecular biology, microbiology, cell biology, genomics andrecombinant DNA, which are within the skill of the art. See Sambrook,Fritsch and Maniatis, MOLECULAR CLONING: A LABORATORY MANUAL, 2ndedition (1989); CURRENT PROTOCOLS IN MOLECULAR BIOLOGY (F. M. Ausubel,et al. eds., (1987)); the series METHODS IN ENZYMOLOGY (Academic Press,Inc.): PCR 2: A PRACTICAL APPROACH (M. J. MacPherson, B. D. Hames and G.R. Taylor eds. (1995)), Harlow and Lane, eds. (1988) ANTIBODIES, ALABORATORY MANUAL, and ANIMAL CELL CULTURE (R. I. Freshney, ed. (1987)).

EXAMPLES

The following examples are given for the purpose of illustrating variousembodiments of the invention and are not meant to limit the presentinvention in any fashion. The present examples, along with the methodsdescribed herein are presently representative of preferred embodiments,are exemplary, and are not intended as limitations on the scope of theinvention. Changes therein and other uses which are encompassed withinthe spirit of the invention as defined by the scope of the claims willoccur to those skilled in the art.

Example 1 Crystal Structure

FIGS. 1A-M provide: various views of the CRISPR-cas complex crystalstructure (A-I), chemieric RNA architecture from the crystal structure(J-K), an interaction schematic from the crystal structure (L) and atopology schematic from the crystal structure (M).

FIGS. 1J-K concern a SpCas9 sgRNA structural study, and FIGS. 4A-B alsopertain to sgRNA mutations. SpCas9 sgRNAs were mutated to investigatecontribution of specific bases or groups of bases to activity. Theseinclude mutations in the direct repeat (DR) and tracrRNA regions of thesgRNA, divided into: stem 1 (base-pairing region between DR andtracrRNA), bulge (un-paired bases between DR and tracrRNA), loop 1(artificial GAAA connector between DR and tracrRNA), linker 1 (betweenstem 1 and stem 2), stem 2 (first hairpin formed by tracrRNA tail), loop2 (loop in between stem 2), stem 3 (second, or last hairpin formed bytracrRNA tail), and loop 3 (loop in between stem 3). Mutations werechosen based on predicted secondary structure as well as secondarystructure as illustrated in FIGS. 1A-M, especially FIG. 1J. In addition,three (3) sgRNA scaffolds were designed to incorporate MS2 loops in loopregions for interaction/binding to recruit functional domains fused toMBP. sgRNAs were synthesized as U6::PCR amplicon and tested inco-transfection with wildtype SpCas9.

400 ng of Cas9 plasmid, 100 ng of sgRNA into 200,000 HEK 293FT cellswith Lipofectamine 2000; DNA was harvested 3 days post-transfection forSURVEYOR analysis.

The invention comprehends a CRISPR-cas9 (S. pyogenes) system having acrystal having the structure defined by the co-ordinates of followingTable A (the CRISPR-cas9 crystal structure). Table A discloses SEQ IDNOS 180-202, respectively, in order of appearance.

Lengthy table referenced here US20200080067A1-20200312-T00001 Pleaserefer to the end of the specification for access instructions.

Example 2 S. pyogenes (Sp) SpCas9 Truncations from Crystal Structure

FIGS. 3A-B pertain to SpCas9 truncations from full length SpCas9. Thesefigures show Surveyor gel test results of SpCas9 truncation mutants fromthe crystal structure that retain cleavage activity (A) and a tableshowing the amino acid truncations and flexible (GGGS) (SEQ ID NO: 1) orrigid (A(EAAAK)) (SEQ ID NO: 2) linker substitutions of the lanes of thegels of FIG. 3A (B)

In this Example, SpCas9 sequences were analyzed by 1. Comparing againstorthologs (S. aureus, S. thermophilus CRISPR1, S. thermophilus CRISPR3,and N. meningiditis), including smaller Cas9s (S. aureus, S.thermophilus CRISPR1, and N. meningiditis) for regions that areconserved or variable, and 2. Boundaries identified by crystallographyas being potentially non-critical for contacting target DNA: sgRNAduplex. A region of SpCas9 (helical domain 2) was not present in manysmaller Cas9 orthologs, and predicted to be dispensable for function.Two similar sets of truncations were made, one by sequence alignmentwith smaller Cas9s, one by crystal prediction. In addition, several setsof flexible glycine-serine (GlyGlyGlySer) (SEQ ID NO: 1) or rigidalpha-helical linkers (Ala(GluAlaAlaAlaLys)Ala) (SEQ ID NO: 29) ingroups of 3, 6, 9, or 12 repeats were also used to replace helicaldomain 2 for potential structural stabilization and/or aiding ofretaining SpCas9:sgRNA specificity. All of the helical region 2truncations and linker substitutions retained SpCas9 activity. SpCas9was truncated systematically in Helical 1, 2, and 3 domains, as well asthe C′-terminal putative PAM-recognizing domain. Truncation mutants weretransfected into HEK 293FT cells as follows: 400 ng of truncation Cas9plasmid and 100 ng of sgRNA co-transfected into 200,000 cells byLipofectamine 2000. DNAs from cells were harvested for SURVEYORanalysis.

Below: full length SpCas9 DNA sequence and sequences of the subdomains;followed by helical domain 2 truncation and variants.

>Full length NLS-SpCas9-NLS (SEQ ID NO: 31)ATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACCTGAACCCCGACAACAGCGACGTGGACAAGCTGTTCATCCAGCTGGTGCAGACCTACAACCAGCTGTTCGAGGAAAACCCCATCAACGCCAGCGGCGTGGACGCCAAGGCCATCCTGTCTGCCAGACTGAGCAAGAGCAGACGGCTGGAAAATCTGATCGCCCAGCTGCCCGGCGAGAAGAAGAATGGCCTGTTCGGAAACCTGATTGCCCTGAGCCTGGGCCTGACCCCCAACTTCAAGAGCAACTTCGACCTGGCCGAGGATGCCAAACTGCAGCTGAGCAAGGACACCTACGACGACGACCTGGACAACCTGCTGGCCCAGATCGGCGACCAGTACGCCGACCTGTTTCTGGCCGCCAAGAACCTGTCCGACGCCATCCTGCTGAGCGACATCCTGAGAGTGAACACCGAGATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAGtaa >N′-terminal NLS(SEQ ID NO: 32)ATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCC >RuvCI domain(SEQ ID NO: 33) GACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACA >Bridging helix (SEQ ID NO: 34)GCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTC >Helical domain 1 (SEQ ID NO: 35)AGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGAC >Helical domain 2 (dispensable)(SEQ ID NO: 36) CTGAACCCCGACAACAGCGACGTGGACAAGCTGTTCATCCAGCTGGTGCAGACCTACAACCAGCTGTTCGAGGAAAACCCCATCAACGCCAGCGGCGTGGACGCCAAGGCCATCCTGTCTGCCAGACTGAGCAAGAGCAGACGGCTGGAAAATCTGATCGCCCAGCTGCCCGGCGAGAAGAAGAATGGCCTGTTCGGAAACCTGATTGCCCTGAGCCTGGGCCTGACCCCCAACTTCAAGAGCAACTTCGACCTGGCCGAGGATGCCAAACTGCAGCTGAGCAAGGACACCTACGACGACGACCTGGACAACCTGCTGGCCCAGATCGGCGACCAGTACGCCGACCTGTTTCTGGCCGCCAAGAACCTGTCCGACGCCATCCTGCTGAGCGACATCCTGAGAGTGAACACCGAG >Helical domain 3 (SEQ ID NO: 37)ATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCC >Flexible linker (SEQ ID NO: 38)CAGGTGTCCGGCCAGGGCGAT >RuvC II (SEQ ID NO: 39)ATCGTGATCGAAATGGCCAGAGAG >HNH (SEQ ID NO: 40)GACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAAC >RuvCIII (SEQ ID NO: 41)CACCACGCCCACGACGCCTACCTG >C-terminal (PAM recognizing domain)(SEQ ID NO: 42) ACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGAC C′-NLS (SEQ ID NO: 43)AAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAG 6. Sp_Δ_hel 2(174-311) helical domain 2 deletion (from orthologalignment) (SEQ ID NO: 44)ATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACATCACCAAGGCaCCaCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAGtaa 7. Sp_Δ_hel 2-(GGGGS)3 helical domain 2 deletion (from orthologalignment)(“(GGGGS)3” disclosed as SEQ ID NO: 45) (SEQ ID NO: 46)ATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACGGTGGCGGTGGCtcgGGTGGCGGTGGCtcgGGTGGCGGTGGCtcgATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAGtaa 8. Sp_Δ_hel 2-(GGGGS)6 helical domain 2 deletion (from orthologalignment)(“(GGGGS)6” disclosed as SEQ ID NO: 47) (SEQ ID NO: 48)ATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACGGTGGCGGTGGCtcgGGTGGCGGTGGCtcgGGTGGCGGTGGCtcgGGTGGCGGTGGCtcgGGTGGCGGTGGCtcgGGTGGCGGTGGCtcgATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAG AAAAAGtaa 9. Sp_Δ_hel 2-(GGGGS)9 helical domain 2 deletion (from orthologalignment)(“(GGGGS)9” disclosed as SEQ ID NO: 49) (SEQ ID NO: 50)ATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACGGTGGCGGTGGCtcgGGTGGCGGTGGCtcgGGTGGCGGTGGCtcgGGTGGCGGTGGCtcgGGTGGCGGTGGCtcgGGTGGCGGTGGCtcgGGTGGCGGTGGCtcgGGTGGCGGTGGCtcgGGTGGCGGTGGCtcgATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAGtaa10. Sp_Δ_hel 2-(GGGGS)12 helical domain 2 deletion (from orthologalignment)(“(GGGGS)12” disclosed as SEQ ID NO: 51) (SEQ ID NO: 52)ATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACGGTGGCGGTGGCtcgGGTGGCGGTGGCtcgGGTGGCGGTGGCtcgGGTGGCGGTGGCtcgGGTGGCGGTGGCtcgGGTGGCGGTGGCtcgGGTGGCGGTGGCtcgGGTGGCGGTGGCtcgGGTGGCGGTGGCtcgGGTGGCGGTGGCtcgGGTGGCGGTGGCtcgGGTGGCGGTGGCtcgATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAGtaa11. Sp_Δ_hel 2-A(EAAAK)3A helical domain 2 deletion (from orthologalignment)(“A(EAAAK)3A” disclosed as SEQ ID NO: 53) (SEQ ID NO: 54)ATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACgctGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAgctATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAGtaa12. Sp_Δ_hel 2-A(EAAAK)3ALEA(EAAAK)3A helical domain 2 deletion(from ortholog alignment)(“A(EAAAK)3ALEA(EAAAK)3A”disclosed as SEQ ID NO: 55) (SEQ ID NO: 56)ATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACgctGAAGCCGCTGCTAAAGAAGCcGCTGCTAAAGAAGCcGCTGCTAAAGccCTGGAGgctGAAGCcGCTGCTAAAGAAGCcGCTGCTAAAGAAGCCGCTGCTAAAgctATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAGtaa13. Sp_Δ_hel 2-A(EAAAK)3ALEA(EAAAK)3ALEA(EAAAK)3A helical domain2 deletion (from ortholog alignment)(“A(EAAAK)3ALEA(EAAAK)3ALEA(EAAAK)3A” disclosed as SEQ ID NO: 57) (SEQ ID NO: 58)ATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACgctGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAGccCTGGAGgctGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAGccCTGGAGgctGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAgctATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAGtaa14. Sp_del_hel 2-A(EAAAK)3LE(EAAAK)3LE(EAAAK)3LE(EAAAK)3A helicaldomain 2 deletion (from ortholog alignment)(“A(EAAAK)3LE(EAAAK)3LE(EAAAK)3LE(EAAAK)3A” disclosed as SEQ ID NO: 59) (SEQ ID NO: 60)ATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACgctGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAGccCTGGAGgctGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAGccCTGGAGgctGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAGccCTGGAGgctGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAgctATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAGtaa 30. Sp_del (175-307)(from crystal data)(SEQ ID NO: 61) ATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACCTGGTGAACACCGAGATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAGtaa31. Sp_del (1098-end) (SEQ ID NO: 62)ATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACCTGAACCCCGACAACAGCGACGTGGACAAGCTGTTCATCCAGCTGGTGCAGACCTACAACCAGCTGTTCGAGGAAAACCCCATCAACGCCAGCGGCGTGGACGCCAAGGCCATCCTGTCTGCCAGACTGAGCAAGAGCAGACGGCTGGAAAATCTGATCGCCCAGCTGCCCGGCGAGAAGAAGAATGGCCTGTTCGGAAACCTGATTGCCCTGAGCCTGGGCCTGACCCCCAACTTCAAGAGCAACTTCGACCTGGCCGAGGATGCCAAACTGCAGCTGAGCAAGGACACCTACGACGACGACCTGGACAACCTGCTGGCCCAGATCGGCGACCAGTACGCCGACCTGTTTCTGGCCGCCAAGAACCTGTCCGACGCCATCCTGCTGAGCGACATCCTGAGAGTGAACACCGAGATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAGtaa32. Sp_del (175-307)-(GGGGS)3 (“(GGGGS)3” disclosed as SEQ ID NO: 45)(SEQ ID NO: 63) ATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACCTGGGTGGaGGTGGCtcgGGTGGaGGTGGCtcgGGTGGaGGTGGCtcgGTGAACACCGAGATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAGtaa33. Sp_del (175-307)-(GGGGS)6 (“(GGGGS)6” disclosed as SEQ ID NO: 47)(SEQ ID NO: 64) ATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACCTGGGTGGaGGTGGCtcgGGTGGaGGTGGCtcgGGTGGaGGTGGCtcgGGTGGaGGTGGCtcgGGTGGaGGTGGCtcgGGTGGaGGTGGCtcgGTGAACACCGAGATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAGtaa 34. Sp_del (175-307)-(GGGGS)9 (“(GGGGS)9”disclosed as SEQ ID NO: 49) (SEQ ID NO: 65)ATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACCTGGGTGGaGGTGGCtcgGGTGGaGGTGGCtcgGGTGGaGGTGGCtcgGGTGGaGGTGGCtcgGGTGGaGGTGGCtcgGGTGGaGGTGGCtcgGGTGGaGGTGGCtcgGGTGGaGGTGGCtcgGGTGGaGGTGGCtcgGTGAACACCGAGATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAA AAAGtaa35. Sp_del (175-307)-(GGGGS)12 (“(GGGGS)12” disclosed as SEQ ID NO: 51)(SEQ ID NO: 66) ATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACCTGGGTGGaGGTGGttcgGGTGGCGGTGGCtcgGGTGGaGGTGGatcgGGTGGCGGTGGttcgGGTGGaGGTGGCtcgGGcGGaGGTGGatcgGGTGGCGGTGGCtcgGGTGGaGGTGGCtcgGGTGGaGGTGGCtcgGGTGGCGGTGGatcgGGTGGaGGTGGatcgGGTGGaGGTGGttcgGTGAACACCGAGATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAGtaa36. Sp_del(175-307)-A(EAAAK)3A (“(EAAAK)3A” disclosed as SEQ ID NO: 53)(SEQ ID NO: 67) ATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACCTGgctGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAgctGTGAACACCGAGATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAGtaa37. Sp_del(175 -307)-A(EAAAK)3ALEA(EAAAK)3A (“A(EAAAK)3 ALEA(EAAAK)3A”disclosed as SEQ ID NO: 55) (SEQ ID NO: 68)ATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACCTGgctGAAGCCGCTGCTAAAGAAGCcGCTGCTAAAGAAGCcGCTGCTAAAGccCTGGAGgctGAAGCcGCTGCTAAAGAAGCcGCTGCTAAAGAAGCCGCTGCTAAAgctGTGAACACCGAGATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAGtaa38. Sp_del(175-307)-A(EAAAK)3ALEA(EAAAK)3ALEA(EAAAK)3A(“A(EAAAK)3ALEA(EAAAK)3ALEA(EAAAK)3A” disclosed as SEQ ID NO: 57)(SEQ ID NO: 69) ATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACCTGgctGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAGccCTGGAGgctGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAGccCTGGAGgctGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAgctGTGAACACCGAGATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAGtaa39. Sp_del(175-307)-A(EAAAK)3LE(EAAAK)3LE(EAAAK)3LE(EAAAK)3A(“A(EAAAK)3LE(EAAAK)3LE(EAAAK)3LE(EAAAK)3A” disclosed as SEQ ID NO: 59)(SEQ ID NO: 70) ATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACCTGgctGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAGccCTGGAGgctGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAGccCTGGAGgctGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAGccCTGGAGgctGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAGAAGCTGCTGCTAAAgctGTGAACACCGAGATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAGtaa

Example 3 New Nickases

FIGS. 2A-C pertain to new SpCas9 nickases and provide A. Schematicshowing catalytic domains of SpCas9, and sites of mutagenesis forputative new nickases. RuvC domains I, II, and III are shown in orange,HNH domain in white between RuvCII and RuvCIII. Domain sizes not drawnto scale. B. Schematic showing locations of sgRNAs used for testingdouble nicking: when sgRNAs are transfected singly (A1 or C1 alone) withSpCas9 nickases, no indels should result. The combination of A1+Cl, usedin combination with RuvCIII mutation nickases result in 5′-overhang,where as D1+A1 and C7+A1 would result in 3′-overhangs. Conversely, thosethree combinations used with HNH mutation nickases would result in 3′-,5′-, and 5′-overhangs, respectively. C. Surveyor test showing 1 HNHmutant that retains nuclease activity (N854A), and 1 HNH mutant thatshows nickase activity (N863A), as well as 2 RuvCIII mutants that shownickase activity (H983A, D986A).

In this Example, five potential nicking mutation sites were chosen basedon sequence homology between Cas9 orthologs. And three additional siteswere chosen based on herein crystallography data. A subset of these setsof nickase mutant Cas9s were re-cloned to incorporate both N′ and C′-NLSsequences that are identical to those of optimized SpCas9. Sequences arebelow.

Nickase mutants were re-cloned to incorporated designated mutations intopAAV-vector under Cbh promoter and sequence validated.

Nuclease and double-nicking activities for all potential nickases weretested in HEK 293FT cells as follows: co-transfection of 400 ng ofnickase and 100 ng of U6-driven sgRNA (100 ng for one guide, or 50 ngeach for a pair of sgRNAs) by Lipofectamine 2000 into ˜200,000 cells.DNAs from transfected cells were collected for SURVEYOR analysis.Nickases do not result in indel mutations when co-transfected with asingle sgRNA, but do when co-transfected with a pair of appropriatelyoff-set sgRNAs. Based on data from the original D10A SpCas9 nickase, thepair of sgRNA chosen (A1/C1) for RuvC domain mutants have 0-bp offsetand 5′-overhang for maximal cleavage.

Homology set: Mutant domain Functional? Cbh-hSpCas9(D10A)-NLS RuvCInickase activity Cbh-hSpCas9(E762A)-NLS RuvCII Cbh-hSpCas9(H840A)-NLSHNH no activity Cbh-hSpCas9(N854A)-NLS HNH wt nuclease activityCbh-hSpCas9(N863A)-NLS HNH nickase activity Cbh-hSpCas9(D986A)-NLSRuvCIII Crystal set set: Mutant domain Functional? NLS-S15A-NLS RuvCIwt nuclease activity NLS-E762A-NLS RuvCII catalytically deadNLS-H982A-NLS RuvCIII wt nuclease activity NLS-H983A-NLS RuvCIIInickase activity NLS-D986A-NLS RuvCIII nickase activity >NLS-S15A-NLSATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACgccGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACCTGAACCCCGACAACAGCGACGTGGACAAGCTGTTCATCCAGCTGGTGCAGACCTACAACCAGCTGTTCGAGGAAAACCCCATCAACGCCAGCGGCGTGGACGCCAAGGCCATCCTGTCTGCCAGACTGAGCAAGAGCAGACGGCTGGAAAATCTGATCGCCCAGCTGCCCGGCGAGAAGAAGAATGGCCTGTTCGGAAACCTGATTGCCCTGAGCCTGGGCCTGACCCCCAACTTCAAGAGCAACTTCGACCTGGCCGAGGATGCCAAACTGCAGCTGAGCAAGGACACCTACGACGACGACCTGGACAACCTGCTGGCCCAGATCGGCGACCAGTACGCCGACCTGTTTCTGGCCGCCAAGAACCTGTCCGACGCCATCCTGCTGAGCGACATCCTGAGAGTGAACACCGAGATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAGtaa(SEQ ID NO: 71) >NLS-E762A-NLSATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACCTGAACCCCGACAACAGCGACGTGGACAAGCTGTTCATCCAGCTGGTGCAGACCTACAACCAGCTGTTCGAGGAAAACCCCATCAACGCCAGCGGCGTGGACGCCAAGGCCATCCTGTCTGCCAGACTGAGCAAGAGCAGACGGCTGGAAAATCTGATCGCCCAGCTGCCCGGCGAGAAGAAGAATGGCCTGTTCGGAAACCTGATTGCCCTGAGCCTGGGCCTGACCCCCAACTTCAAGAGCAACTTCGACCTGGCCGAGGATGCCAAACTGCAGCTGAGCAAGGACACCTACGACGACGACCTGGACAACCTGCTGGCCCAGATCGGCGACCAGTACGCCGACCTGTTTCTGGCCGCCAAGAACCTGTCCGACGCCATCCTGCTGAGCGACATCCTGAGAGTGAACACCGAGATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGccATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAGtaa(SEQ ID NO: 72) >NLS-H982A-NLSATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACCTGAACCCCGACAACAGCGACGTGGACAAGCTGTTCATCCAGCTGGTGCAGACCTACAACCAGCTGTTCGAGGAAAACCCCATCAACGCCAGCGGCGTGGACGCCAAGGCCATCCTGTCTGCCAGACTGAGCAAGAGCAGACGGCTGGAAAATCTGATCGCCCAGCTGCCCGGCGAGAAGAAGAATGGCCTGTTCGGAAACCTGATTGCCCTGAGCCTGGGCCTGACCCCCAACTTCAAGAGCAACTTCGACCTGGCCGAGGATGCCAAACTGCAGCTGAGCAAGGACACCTACGACGACGACCTGGACAACCTGCTGGCCCAGATCGGCGACCAGTACGCCGACCTGTTTCTGGCCGCCAAGAACCTGTCCGACGCCATCCTGCTGAGCGACATCCTGAGAGTGAACACCGAGATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACgccCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAGtaa(SEQ ID NO: 73) >NLS-H983A-NLSATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACCTGAACCCCGACAACAGCGACGTGGACAAGCTGTTCATCCAGCTGGTGCAGACCTACAACCAGCTGTTCGAGGAAAACCCCATCAACGCCAGCGGCGTGGACGCCAAGGCCATCCTGTCTGCCAGACTGAGCAAGAGCAGACGGCTGGAAAATCTGATCGCCCAGCTGCCCGGCGAGAAGAAGAATGGCCTGTTCGGAAACCTGATTGCCCTGAGCCTGGGCCTGACCCCCAACTTCAAGAGCAACTTCGACCTGGCCGAGGATGCCAAACTGCAGCTGAGCAAGGACACCTACGACGACGACCTGGACAACCTGCTGGCCCAGATCGGCGACCAGTACGCCGACCTGTTTCTGGCCGCCAAGAACCTGTCCGACGCCATCCTGCTGAGCGACATCCTGAGAGTGAACACCGAGATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACgccGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAGtaa(SEQ ID NO: 74) >NLS-D986A-NLSATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACCTGAACCCCGACAACAGCGACGTGGACAAGCTGTTCATCCAGCTGGTGCAGACCTACAACCAGCTGTTCGAGGAAAACCCCATCAACGCCAGCGGCGTGGACGCCAAGGCCATCCTGTCTGCCAGACTGAGCAAGAGCAGACGGCTGGAAAATCTGATCGCCCAGCTGCCCGGCGAGAAGAAGAATGGCCTGTTCGGAAACCTGATTGCCCTGAGCCTGGGCCTGACCCCCAACTTCAAGAGCAACTTCGACCTGGCCGAGGATGCCAAACTGCAGCTGAGCAAGGACACCTACGACGACGACCTGGACAACCTGCTGGCCCAGATCGGCGACCAGTACGCCGACCTGTTTCTGGCCGCCAAGAACCTGTCCGACGCCATCCTGCTGAGCGACATCCTGAGAGTGAACACCGAGATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGcCGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAG (SEQ ID NO: 75)

Example 4 Truncating and Creating Chimeric Cas9s Based on S. pyogenesCas9 Crystral Structure Herein

FIGS. 5A-C pertain to truncating and creating chimeric Cas9s based onthe herein crystal structure. These figures provide schematicsillustrating A. SpCas9 mutants designed for mapping out essentialfunctional domains of Cas9 for truncation of protein. B. chimeric Cas9sthat contain sequences (regions in pink) from Cas9 from S. thermophilusCRISPR 1, S. thermophilus CRISPR 3, Staphylococcus aureus, Neisseriameningiditis, or other Cas9 orthologs. C. Designs for creatingchemically inducible dimerization of SpCas9. The chemically inducibleSpCas9 functions.

DNA sequences for chimeric Cas9s are optimized for human expression byGenScript and synthesized de novo. Chimeric Cas9 proteins can beconstructed by cloning and ligating individual functional domains fromCas9 orthologs (i.e. by PCR-amplifying individual functional domainsfrom a desired Cas9 ortholog, then assemblying the pieces together byeither Gibson or Golden Gate-cloning). Additionallly, a set ofchemically-inducible Cas9s were constructed as two-component systems,where one portion of the Cas9 protein is fused to FKBP, and theremainder fused to FRB (e.g. FKBP-Cas9(amino acids 1-1098),FRB-Cas(1099-1368)). In absence of chemical induction, co-transfectionof the two inducible Cas9 components have no catalytic activity, but thefunctional assembly of the components may be induced using Rapamycin [5nM to 10 μM].

Example 5 Crystal Structure of Cas9 in Complex with Guide RNA and TargetDNA

Cas9 is an RNA-guided nuclease from the microbial CRISPR-Cas system thatcan be targeted to specific genomic loci by single guide RNAs (sgRNAs).Applicants report the crystal structure of Streptococcus pyogenes Cas9in complex with sgRNA and its target DNA at 2.4 Å resolution. Thestructure revealed a bilobed architecture composed of target recognitionand nuclease lobes, accommodating a sgRNA:DNA duplex in apositively-charged groove at their interface. Whereas the recognitionlobe is essential for sgRNA and DNA binding, the nuclease lobe containsthe HNH and RuvC nuclease domains, which are properly positioned for thecleavage of complementary and non-complementary strands of the targetDNA, respectively. This high-resolution structure and accompanyingfunctional analyses elucidate the molecular mechanism of RNA-guided DNAtargeting by Cas9, paving the way for rational design of new andversatile genome-editing technologies.

The CRISPR (clustered regularly interspaced palindromic repeat)-Cassystem is a naturally occurring microbial adaptive immune system fordefense against invading phages and other mobile genetic elements(Deveau et al., 2010; Horvath and Barrangou, 2010; Marraffini andSontheimer, 2010; Terns and Terns, 2011). Three types (I-III) ofCRISPR-Cas systems have been functionally identified across a wide rangeof microbial species (Barrangou et al., 2007; Brouns et al., 2008;Marraffini and Sontheimer, 2008), each containing a cluster ofCRISPR-associated (Cas) genes and its corresponding CRISPR array. Thesecharacteristic CRISPR arrays consist of repetitive sequences (directrepeats, referred to as repeats) interspaced by short stretches ofnon-repetitive sequences (spacers) derived from short segments offoreign genetic material (protospacers). The CRISPR array is transcribedand processed into short CRISPR RNAs (crRNAs), which direct Cas proteinsto the target nucleic acids, DNA or RNA, via Watson-Crick base pairingto facilitate the nucleic acid destruction.

Type I and III CRISPR systems utilize ensembles of Cas proteins incomplex with crRNA to mediate recognition and subsequent degradation oftarget nucleic acids (Spilman et al., 2013; Wiedenheft et al., 2011). Incontrast, the Type II CRISPR system achieves recognition and cleavage ofthe target DNA (Garneau et al., 2010) via a single enzyme called Cas9(Sapranauskas et al., 2011) along with two non-coding RNAs, the crRNAand a trans-activating crRNA (tracrRNA) (Deltcheva et al., 2011). ThecrRNA hybridizes with the tracrRNA to form a crRNA:tracrRNA duplex,which is then loaded onto Cas9 to direct cleavage of cognate DNAsequences bearing appropriate protospacer adjacent motifs (PAM) (Mojicaet al., 2009).

The Type II CRISPR system was the first to be adapted for facilitatinggenome editing in eukaryotic cells (Cong et al., 2013; Mali et al.,2013b). The Cas9 protein from Streptococcus pyogenes, along with asingle guide RNA (sgRNA), a synthetic fusion of crRNA and minimaltracrRNA (Jinek et al., 2012), could be programmed to instruct cleavageof virtually any sequence preceding a 5′-NGG PAM sequence in mammaliancells (Cong et al., 2013; Mali et al., 2013b). This unprecedentedflexibility has enabled a broad range of applications including rapidgeneration of genetically modified cells and animal models (Gratz etal., 2013; Hwang et al., 2013; Wang et al., 2013; Yang et al., 2013),and genome-scale genetic screening (Qi et al., 2013; Shalem et al.,2014; Wang et al., 2014).

However, despite brisk progress in the development of the Cas9technology, the mechanism of how the Cas9-sgRNA complex recognizes andcleaves its target DNA remains to be elucidated. Up to date, biochemicalanalyses at the domain levels have enabled site-specific engineering toconvert the native Cas9 into a DNA nicking enzyme (Gasiunas et al.,2012; Jinek et al., 2012; Sapranauskas et al., 2011) that facilitateshomology-directed repair in eukaryotic cells (Cong et al., 2013; Mali etal., 2013b) and further cleaves DNA with improved specificity givenappropriately paired sgRNAs (Mali et al., 2013a; Ran et al., 2013).Moreover, a catalytically inactive Cas9 can serve as a RNA-guidedDNA-binding platform to target effector domains and modulate endogenoustranscription (Gilbert et al., 2013; Konermann et al., 2013; Maeder etal., 2013; Perez-Pinera et al., 2013; Qi et al., 2013). These Cas9engineering advances represent just the first steps of what is possiblein fully realizing the potential of this flexible RNA-guided genomepositioning system. A precise structural information on Cas9 will thusnot only enhance the understanding of how this elegant RNA-guidedmicrobial adaptive immune system functions, but also inform furtherimprovements of Cas9 targeting specificity, simplification of in vitroand in vivo delivery, and engineering of Cas9 for novel functions andoptimized features.

In this example, Applicants report the crystal structure of S. pyogenesCas9 in complex with sgRNA and its target DNA at 2.4 Å resolution. Thishigh-resolution structure along with functional analysis reveals the keyfunctional interactions that integrate the guide RNA, target DNA, andCas9 protein, paving the way towards enhancing Cas9 function as well asengineering novel applications.

Overall structure of the Cas9-sgRNA-DNA ternary complex: Applicantssolved the crystal structure of full-length S. pyogenes Cas9 (residues1-1368; D10A/C80L/C574E/H840A) in complex with a 98-nucleotide (nt)sgRNA and a 23-nt target DNA, at 2.4 Å resolution, by the SAD(single-wavelength anomalous dispersion) method using a SeMet-labeledprotein (FIG. 15 and Table 1). To improve the solution behavior of Cas9,Applicants replaced two less conserved cysteine residues (Cys80 andCys574) with leucine and glutamic acid, respectively. This C80L/C574Emutant retained the ability to efficiently cleave genomic DNA in humanembryonic kidney 293FT (HEK293FT) cells, confirming that these mutationshave no effects on Cas9 nuclease function (FIG. 16). Additionally, toprevent cleavage of the target DNA during crystallization, Applicantsreplaced the two catalytic residues, Asp10 from the RuvC domain andHis840 from the HNH domain, with alanine.

TABLE 1 Data collection and refinement statistics Native Cas9 SeMet Cas9Data collection Beamline SPring-8 BL32XU SPring-8 BL41XU Wavelength (Å)1.000 0.9791 Space group P1 P1 Cell dimensions a, b, c (Å) 76.7, 105.7,126.8 76.2, 104.5, 125.5 α, β, γ (°) 97.7, 98.4, 100.3 97.0, 98.2, 101.1Resolution (Å) 50-2.4 (2.54-2.4) 50-2.6 (2.67-2.6) R_(sym) 0.07 (1.53)0.167 (1.96) I/σI 22.53 (1.45) 12.62 (1.44) Completeness (%) 98.2 (96.3)99.9 (99.9) Redundancy 7.93 (7.88) 19.1 (15.9) CC(½) 0.999 (0.671) 0.999(0.736) Refinement Resolution (Å) 50-2.4 No. reflections 146,862R_(work)/R_(free) 0.241/0.276 No. atoms Protein 19,021 Nucleic acid5,013 Solvent 200 B-factors Protein 72.6 Nucleic acid 72.6 Solvent 53.3R.m.s deviations Bond lengths (Å) 0.002 Bond angles (°) 0.454Ramachandran plot Favored region 96.8% Allowed region 3.2% Outlierregion 0.0% *Highest resolution shell is shown in parenthesis.

The crystallographic asymmetric unit contained two Cas9-sgRNA-DNAternary complexes (Mol A and Mol B). Although there are conformationaldifferences between the two complexes, sgRNA and DNA are recognized byCas9 in a similar manner. Most notably, while the HNH domain in Mol A isconnected with the RuvC domain by a disordered linker, the HNH domain inMol B is not visible in the electron density map, indicating theflexible nature of the HNH domain. Thus, Applicants first describe thestructural features of Mol A unless otherwise stated, and then discussthe structural differences between the two complexes, which suggest theconformational flexibility of Cas9.

The crystal structure revealed that Cas9 consists of two lobes, arecognition (REC) lobe and a nuclease (NUC) lobe (FIG. 8A-C). The REClobe can be divided into three regions, a long α-helix referred to asBridge helix (BH) (residues 60-93), the REC1 (residues 94-179 and308-713), and REC2 (residues 180-307) domains (FIG. 8A-C). The NUC lobeconsists of the RuvC (residues 1-59,718-769, and 909-1098), HNH(residues 775-908), and PAM-interacting (PI) (residues 1099-1368)domains (FIG. 8A-C). The negatively-charged sgRNA:DNA hybrid duplex isaccommodated in a positively-charged groove at the interface between theREC and NUC lobes (FIG. 8D). In the NUC lobe, the RuvC domain isassembled from the three split RuvC motifs (RuvC I-III), whichinterfaces with the PI domain to form a positively-charged surface thatinteracts with the 3′ tail of the sgRNA (FIG. 8D). The HNH domain liesin between the RuvC II-III motifs and forms only a few contacts with therest of the protein.

The REC lobe of Cas9 interacted with the repeat:anti-repeat duplex: TheREC lobe comprises the REC1 and REC2 domains. REC1 adopted an elongated,α-helical structure comprising 26 α-helices (α2-α5 and α12-α33) and twoβ-sheets (β6/β10 and β7-β9), whereas REC2 adopted a six-helix bundlestructure (α6-α11) (FIGS. 9A and 17). A Dali search (Holm andRosenstrom, 2010) revealed that the REC lobe did not share structuralsimilarity with other known proteins, indicating that it is aCas9-specific functional domain.

The REC lobe is one of the least conserved regions across the threefamilies of Cas9 within the Type II CRISPR system (IIA, IIB and IIC) andmany Cas9s contain significantly shorter REC lobes (FIGS. 18, 19).Applicants hypothesized that truncations in the REC lobe could betolerated. As expected, and consistent with the observation that theREC2 domain does not contact the bound sgRNA:DNA hybrid duplex, a Cas9mutant lacking the REC2 domain (4175-307) showed ˜50% of the wild-typeCas9 activity (FIG. 9B), indicating that the REC2 domain is not criticalfor DNA cleavage. The lower cleavage efficiency may be attributed inpart to the reduced levels of Cas9 (4175-307) expression relative tothat of the wild-type protein (FIG. 9C). In striking contrast, deletionof the crRNA repeat-interacting region (497-150) or tracrRNAanti-repeat-interacting region (4312-409) of the REC1 domain abolishedDNA cleavage activity (FIG. 9B), indicating that the recognition of therepeat:anti-repeat duplex by the REC1 domain is critical for Cas9function.

The PAM-interacting (PI) domain confers PAM specificity: The NUC lobecontains the PI domain, which adopts an elongated structure comprisingseven α-helices (α47-α53), a three-stranded antiparallel β-sheet(β18-β20), a five-stranded antiparallel β-sheet (β21-β23, β26 and β27),and two-stranded antiparallel β-sheet (β24 and β25) (FIGS. 9D and 17).Similar to the REC lobe, the PI domain also represents a novel proteinfold unique to the Cas9 family.

The locations of the bound complementary strand DNA and the active siteof the RuvC domain in the present structure suggest that the PI domainis positioned to recognize the PAM sequence on the non-complementarystrand of the target DNA. Applicants tested whether replacement of theS. pyogenes Cas9 (SpCas9; Cas9 in this study) PI domain with that of anorthologous Cas9 protein recognizing a different PAM would be sufficientto alter SpCas9 PAM specificity. The Streptococcus thermophilus CRISPR-3Cas9 (St3Cas9) shares ˜60% sequence identity with SpCas9; furthermore,their crRNA repeats and tracrRNAs are interchangeable (Fonfara et al.,2013). However, SpCas9 and St3Cas9 require different PAM sequences(5′-NGG for Cas9 and 5′-NGGNG for St3Cas9) for target DNA cleavage(Fonfara et al., 2013).

Applicants swapped the two PI domains to generate two chimeras,Sp-St3Cas9 (SpCas9 with the PI domain of St3Cas9) and St3-SpCas9(St3Cas9 with the PI domain of SpCas9), and examined their cleavageactivities for target DNA sequences bearing 5′-NGG PAM (5′-GGGCT) or5′-NGGNG PAM (5′-GGGCG) (FIG. 9E). SpCas9 and St3-SpCas9, but notSt3Cas9, cleaved the target DNA with 5′-NGG PAM (FIG. 9E), indicatingthat the PI domain of SpCas9 is required for the recognition of 5′-NGGPAM and is sufficient to alter the PAM recognition of St3Cas9.Sp-St3Cas9 retained cleavage activity for the target DNA with 5′-NGGPAM, albeit at a lower level than that of SpCas9 (FIG. 9E).Additionally, deletion of the PI domain (Δ1099-1368) abolished thecleavage activity (FIG. 9E), indicating that the PI domain is criticalfor Cas9 function. These results reveal that the PI domain is a majordeterminant of PAM specificity.

The RuvC domain targets the non-complementary strand DNA: The RuvCdomain consists of a six-stranded mixed β-sheet (β1, β2, β5, β11, β14and β17) flanked by α-helices (α34, α35 and α40-α46) and two additionaltwo-stranded antiparallel β-sheets (β3/β4 and β15/β16) (FIGS. 10A and17). It shares structural similarity with retroviral integrasesuperfamily members characterized by an RNase H fold, such asEscherichia coli RuvC (PDB code 1HJR, 13% identity, root-mean-squaredeviation (rmsd) of 3.4 Å for 123 equivalent Cα atoms) (Ariyoshi et al.,1994) and Thermus thermophilus RuvC (PDB code 4LD0, 17% identity, rmsdof 3.4 Å for 129 equivalent Cα atoms) (Ariyoshi et al., 1994) andThermus thermophilus RuvC (PDB code 4LD0, 17% identity, rmsd of 3.4 Åfor 129 equivalent Cα atoms) (Gorecka et al., 2013) (FIG. 10B). RuvCnucleases have four catalytic residues (e.g., Asp7, Glu70, His143 andAsp146 in T. thermophilus RuvC), and cleave Holliday junctions through atwo-metal mechanism (Ariyoshi et al., 1994; Chen et al., 2013; Goreckaet al., 2013). Asp10 (Ala), Glu762, His983 and Asp986 of the Cas9 RuvCdomain are located at positions similar to those of the catalyticresidues of T. thermophilus RuvC (FIG. 10A, B), consistent with theprevious results that the D10A mutation abolished cleavage of thenon-complementary DNA strand and that Cas9 requires Mg2+ ions forcleavage activity (Gasiunas et al., 2012; Jinek et al., 2012). Moreover,alanine substitution of Glu762, His983 or Asp986 also converted Cas9into nickases (FIG. 10C, D). Each nickase mutant was able to facilitatetargeted double strand breaks using pairs of juxtaposed sgRNAs (FIG.10C, D), as demonstrated with the D10A nickase previously (Ran et al.,2013). This combination of structural observations and mutationalanalysis suggest that the Cas9 RuvC domain cleaves the non-complementarystrand of the target DNA through the two-metal mechanism previouslyobserved for other retroviral integrase superfamily nucleases.

It is important to note that there are key structural dissimilaritiesbetween the Cas9 RuvC domain and RuvC nucleases, explaining theirfunctional differences. Unlike the Cas9 RuvC domain, RuvC nucleasesforms a dimer and recognize a Holliday junction (Gorecka et al., 2013)(FIG. 10B). In addition to the conserved RNase H fold, the RuvC domainof Cas9 has additional structural elements involved in the interactionswith the guide:DNA duplex (an end-capping loop between α43 and α44), andthe PI domain/stem loop 3 (β-hairpin formed by β3 and β4) (FIG. 10A).

The HNH domain targets the complementary strand DNA: The HNH domaincomprises a two-stranded antiparallel β-sheet (β12 and β13) flanked byfour α-helices (α36-α42) (FIG. 10E). Likewise, it shares structuralsimilarity with HNH endonucleases characterized by a ββα-metal fold,such as the phage T4 endonuclease VII (Endo VII) (Biertumpfel et al.,2007) (PDB code 2QNC, 8% identity, rmsd of 2.6 Å for 60 equivalent Cαatoms) (FIG. 10F) and Vibrio vulnificus nuclease (Li et al., 2003) (PDBcode 1OUP, 8% identity, rmsd of 2.9 Å for 78 equivalent Cα atoms). HNHnucleases have three catalytic residues (e.g., Asp40, His41, and Asn62in Endo VII), and cleave nucleic acid substrates through a single-metalmechanism (Biertumpfel et al., 2007; Li et al., 2003). In the structureof the Endo VII N62D mutant in complex with a Holliday junction, a Mg2+ion is coordinated by Asp40, Asp62, and oxygen atoms of the scissilephosphate group of the substrate, while His41 acts as a general base toactivate a water molecule for catalysis (FIG. 10F). Asp839, His840, andAsn863 of the Cas9 HNH domain correspond to Asp40, His41, and Asn62 ofEndo VII, respectively (FIG. 10E), consistent with the observation thatHis840 is critical for the cleavage of the complementary DNA strand(Gasiunas et al., 2012; Jinek et al., 2012). The N863A mutant functionsas a nickase (FIG. 10C, D), indicating that Asn863 participates incatalysis. These observations suggest that the Cas9 HNH domain maycleave the complementary strand of the target DNA through a single-metalmechanism as observed for other HNH superfamily nucleases. However, inthe present structure, Asn863 of Cas9 is located at a position differentfrom that of Asn62 in Endo VII (Biertumpfel et al., 2007), whereasAsp839 and His840 (Ala) of Cas9 are located at positions similar tothose of Asp40 and His41 of Endo VII, respectively (FIG. 10E, F). Thismight be due to the absence of divalent ions, such as Mg2+, inApplicants' crystallization solution, suggesting that Asn863 can pointtowards the active site and participate in catalysis. Whereas the HNHdomain shares a ββα-metal fold with other HNN endonuclease, theiroverall structures are different (FIG. 10E, F), consistent with thedifferences in their substrate specificities.

sgRNA recognizes target DNA via Watson-Crick base pairing: The sgRNAconsists of crRNA- and tracrRNA-derived sequences connected by anartificial tetraloop (FIG. 11A). The crRNA sequence can be subdividedinto guide (20-nt) and repeat (12-nt) regions, and the tracrRNA sequencelikewise into anti-repeat (14-nt) and three tracrRNA stem loops (FIG.11A). The crystal structure reveals that the sgRNA binds the target DNAto form a T-shaped architecture comprising a guide:DNA duplex,repeat:anti-repeat duplex and stem loops 1-3 (FIG. 11A, B). Therepeat:anti-repeat duplex and stem loop 1 are connected by a singlenucleotide (A51), and stem loops 1 and 2 are connected by a 5-ntsingle-stranded linker (nucleotides 63-67).

The guide (nucleotides 1-20) and target DNA (nucleotides 3′-23′) formthe guide:DNA hybrid duplex via 20 Watson-Crick base pairs, with theconformation of the duplex distorted from a canonical A-form RNA duplex(FIGS. 11B and 20). The crRNA repeat (nucleotides 21-32) and tracrRNAanti-repeat (nucleotides 37-50) form the repeat:anti-repeat duplex vianine Watson-Crick base pairs (U22:A49-A26:U45 and G29:C40-A32:U37) (FIG.11A, B). Within this region, G27, A28, A41, A42, G43, and U44 areunpaired, with A28 and U44 flipped out from the duplex (FIG. 11C). Thenucleobases of G27 and A41 stack with the A26:U45 and G29:C40 pairs,respectively, and the 2-amino group of G27 interacts with the backbonephosphate group between G43 and U44, stabilizing the duplex structure(FIG. 11C). G21 and U50 form a wobble base pair at the three-wayjunction between the guide:DNA/repeat:anti-repeat duplexes and stem loop1, stabilizing the T-shaped architecture (FIG. 11C).

As expected from the RNA-fold predictions of the nucleotide sequence,the tracrRNA 3′ tail (nucleotides 68-81 and 82-96) form stem loops 2 and3 via four and six Watson-Crick base pairs (A69:U80-U72:A77 andG82:C96-G87:C91), respectively (FIG. 11A, B). Previously unappreciated,nucleotides 52-62 also form a stem loop (stem loop 1) via threeWatson-Crick base pairs (G53:C61, G54:C60 and C55:G58), with U59 flippedout from the stem (FIG. 11A, B). Stem loop 1 is stabilized by theG62-G53:C61 stacking interaction and the G62-A51/A52 polar interactions(FIG. 11C).

The guide:DNA and repeat:anti-repeat duplexes are accommodated anddeeply buried in a positively-charged groove at the interface of the twolobes, while the rest of the sgRNA extensively interacts with thepositively-charged surface on the back side of the protein (FIG. 8D). InMol A, the 3′-terminal bases of the target DNA (3′-ACC complementary tothe PAM) are not visible in the electron density map. In contrast, thetwo adjacent bases (3′-AC) in Mol B are not recognized by Cas9, althoughthey are structurally ordered due to the crystal packing interactionsand are visible in the electron density map. These observations suggestthat the 3′-ACC sequence complementary to the PAM (5′-TGG) is notrecognized by Cas9, consistent with the previous biochemical datademonstrating that Cas9-catalyzed DNA cleavage requires the 5′-NGG PAMon the non-complementary strand, but not the 3′-NCC sequence on thecomplementary strand (Jinek et al., 2012).

Previous studies showed that although sgRNA with a 48-nt tracrRNA tail(referred to as sgRNA(+48)) is a minimal region for the Cas9-catalyzedDNA cleavage in vitro (Jinek et al., 2012), sgRNAs with extendedtracrRNA tails, sgRNA(+67) and sgRNA(+85), dramatically improved Cas9cleavage activity in vivo (Hsu et al., 2013). The present structurerevealed that sgRNA(+48), sgRNA(+67) and sgRNA(+85) contain stem loop 1,stem loops 1-2 and stem loops 1-3, respectively (FIG. 11A, B). Theseobservations indicated that, whereas stem loop 1 is essential for theformation of the functional Cas9-sgRNA complex, stem loops 2 and 3further support the stable complex formation as well as enhance sgRNAstability, thus improving the in vivo activity.

To confirm the significance of each sgRNA structural component on Cas9function, Applicants tested a number of sgRNAs with mutations in therepeat:anti-repeat duplex, stem loops 1-3, and the linker between stemloops 1 and 2. Applicants' results revealed that, whereas stem loops 2and 3 as well as the linker region can tolerate a large number ofmutations, the repeat:anti-repeat duplex and stem loop 1 are criticalfor Cas9 function (FIG. 11D). Moreover, the sgRNA sequence can toleratea large number of mutations (FIG. 11D, reconstructed sgRNA). Theseresults highlight the functional significance of the structure-dependentrecognition of the repeat:anti-repeat duplex by Cas9.

Conserved arginine cluster on Bridge helix play a critical role insgRNA:DNA interaction: The crRNA guide region is primarily recognized bythe REC lobe (FIG. 12A). The backbone phosphate groups of the crRNAguide region (nucleotides 4-6 and 13-20) interact with the REC1 domain(Arg165, Gly166, Arg403, Asn407, Lys510, Tyr515 and Arg661) and Bridgehelix (Ala59, Arg63, Arg66, Arg70, Arg71, Arg74 and Arg78) (FIG. 12B),and the 2′-hydroxyl groups of C15, U16 and G19 hydrogen bond withTyr450, Arg447/Ile448 and Thr404 in the REC1 domain (FIG. 12B),respectively. These observations suggested that the Watson-Crick facesof eight PAM-proximal nucleotides of the Cas9-bound sgRNA are exposed tothe solvent, thus serving as a nucleation site for pairing with thetarget complementary strand. This is consistent with previous reportsthat the 10-12 bp PAM-proximal “seed” region is critical forCas9-catalyzed DNA cleavage (Cong et al., 2013; Fu et al., 2013; Hsu etal., 2013; Jinek et al., 2012; Mali et al., 2013a; Pattanayak et al.,2013).

Mutational analysis demonstrated that the R66A, R70A and R74A mutationson Bridge helix markedly reduced DNA cleavage activities (FIG. 12C),highlighting the functional significance of the recognition of the sgRNA“seed” region by the Bridge helix. Although Arg78 and Arg165 alsointeract with the “seed” region, the R78A and R165A mutants showed onlymoderately decreased activities (FIG. 12C). These results may reflectthat, whereas Arg66, Arg70 and Arg74 form bifurcated salt bridges withthe sgRNA backbone, Arg78 and Arg165 form a single salt bridge with thesgRNA backbone. A cluster of arginine residues on the Bridge helix arehighly conserved among Cas9 proteins in the Type II-A-C systems (FIGS.18, 19), suggesting that the Bridge helix is a universal structuralfeature of Cas9 proteins involved in recognition of the sgRNA and targetDNA. This notion is supported by a previous observation that a strictlyconserved arginine residue, equivalent to Arg70 of S. pyogenes Cas9, isessential for the function of Francisella novicida Cas9 in the Type II-Bsystem (Sampson et al., 2013). Moreover, the alanine mutation of therepeat:anti-repeat duplex-interacting residues (Arg75 and Lys163) andstem loop 1-interacting residue (Arg69) resulted in decreased DNAcleavage activity (FIG. 12C), confirming the functional importance ofthe recognition of the repeat:anti-repeat duplex and stem loop 1 byCas9.

The crRNA guide region is recognized by Cas9 in a sequence-independentmanner except for the U16-Arg447 and G18-Arg71 interactions (FIG. 12A,B). This base-specific G18-Arg71 interaction may partly explain theobserved preference of Cas9 for sgRNAs having guanines in the fourPAM-proximal guide sequences (Wang et al., 2014).

The REC1 and RuvC domains facilitate RNA-guided DNA targeting: Cas9recognizes the 20-bp DNA target site in a sequence-independent manner(FIG. 12A). The backbone phosphate groups of the target DNA (nucleotides1′, 9′-11′, 13′, and 20′) interact with the REC1 (Asn497, Trp659, Arg661and Gln695), RuvC (Gln926), and PI (Glu1108) domains. The C2′ atoms ofthe target DNA (nucleotides 5′, 7′, 8′, 11′, 19′, and 20′) form van derWaals interactions with the REC1 domain (Leu169, Tyr450, Met495, Met694and His698) and RuvC domain (Ala728) (FIG. 12D). These interactions arelikely to contribute towards discriminating between DNA vs. RNA targetsby Cas9. The terminal base pair of the guide:DNA duplex (G1:C20′) isrecognized by the RuvC domain via end-capping interactions (FIG. 12D);the nucleobases of sgRNA G1 and target DNA C20′ interact with the sidechains of Tyr1013 and Val1015, respectively, whereas the 2′-hydroxyl andphosphate groups of sgRNA G1 interact with Val1009 and Gln926,respectively. These end-capping interactions are consistent with theprevious observation that Cas9 recognizes a 17-20-bp guide:DNA duplex,and that extended guide sequences are degraded in cells and do notcontribute to improving sequence specificity (Mali et al., 2013a; Ran etal., 2013). Taken together, these structural findings explain theRNA-guided DNA targeting mechanism of Cas9.

The repeat:anti-repeat duplex is recognized by the REC and NUC lobes ina sequence-dependent manner: The repeat:anti-repeat duplex isextensively recognized by the REC and NUC lobes (FIG. 12A). The backbonephosphate groups of the crRNA repeat (nucleotides 24, 26, and 27) andanti-repeat (nucleotides 41, 45, 46, and 48-50) interact with the REC1domain (Arg115, His116, His160, Lys163, Arg340, and Arg403), PI domain(Lys1113), and Bridge helix (Lys76) (FIG. 12E, F). The 2′-hydroxylgroups of the crRNA repeat (nucleotides 22-24) and anti-repeat(nucleotides 43-45 and 47) hydrogen bond with the REC1 domain (Leu101,Ser104, Phe105, Ile135, Tyr359, and Gln402) and the PI domain (Ile1110and Tyr1131).

In contrast to the sequence-independent recognition of the guide region,there are sequence-dependent interactions between Cas9 and therepeat:anti-repeat duplex. The nucleobase of the flipped U44 issandwiched between the side chains of Tyr325 and His328, with its N3atom hydrogen bonded with the carbonyl group of Tyr325, while that ofunpaired G43 stacks with the side chain of Tyr359 and hydrogen bondswith the side chain of Asp364 (FIG. 12A, F). Finally, the nucleobases ofU23/A49 and A42/G43 hydrogen bond with the side chain of Arg1122 and themain-chain carbonyl group of Phe351, respectively.

In the present structure, the repeat:anti-repeat duplex is recognizedprimarily by the REC lobe, which is divergent in sequence and lengthamong Cas9 orthologs within the Type II-A-C systems (FIGS. 18, 19),consistent with the previous observation that Cas9 and sgRNA areinterchangeable only between closely related Type II systems (Fonfara etal., 2013). The three PAM-distal base pairs (C30:G39-A32:U37) are notrecognized by Cas9 and protrude from the complex (FIG. 12A), consistentwith a proposed model in which a Cas9-bound repeat:anti-repeat duplex isprocessed by the host RNase III enzyme (Deltcheva et al., 2011).

The nucleobases of G21 and U50 in the G21:U50 wobble pair stack with theterminal C20:G1′ pair in the guide:DNA duplex and the side chain ofTyr72 on Bridge helix, respectively, with the U50 04 atom hydrogenbonded with the side chain of Arg75 (FIG. 12E). Notably, A51 adopts thesyn-conformation, and is oriented in the direction opposite to U50(FIGS. 11C and 12G). The nucleobase of A51 is sandwiched between thePhe1105 side chain in the PI domain and the U63 nucleobase in thelinker, with its N7 and N1 atoms hydrogen bonded with the main-chainamide group of Phe1105 and the G62 2′-hydroxyl group in stem loop 1,respectively (FIG. 12G). Whereas a repeat:anti-repeat duplex is diversein sequence and length among the Type II-A-C systems, the G21:U50 basepair is highly conserved among Cas9s (Fonfara et al., 2013), suggestingthat this wobble pairing is a universal structural feature involved inthe three-way junction formation.

To verify the sequence-dependent recognition of the repeat:anti-repeatduplex, Applicants evaluated the effect of repeat:anti-repeat mutationson Cas9-meditated DNA cleavage, and found multiple mutations thatsignificantly reduce Cas9 activity (FIG. 12C). Notably, replacement ofG43, which forms a base-specific hydrogen bond with Asp364, with adeninereduced Cas9 activity by over 3-fold. In addition, replacement of theflipped U44 in the repeat:anti-repeat duplex with adenine resulted inover a 5-fold drop in cleavage activity, whereas replacement of U44 withanother pyrimidine base (cytosine) did not significantly affect cleavageactivity (FIG. 12C). These results suggest that base-specificrecognition of G43 and U44 could play an important role in sgRNArecognition by Cas9.

sgRNA stem loops 1-3 interact with Cas9: Stem loop 1 is primarilyrecognized by the REC lobe together with the PI domain (FIG. 12A). Thebackbone phosphate groups of stem loop 1 (nucleotides 52, 53, and 59-61)interact with the REC1 domain (Leu455, Ser460, Arg467, Thr472, andIle473), PI domain (Lys1123 and Lys1124), and Bridge helix (Arg70 andArg74), with the 2′-hydroxyl group of G58 hydrogen bonded with Leu455 inthe REC1 domain. A52 interacts with Phe1105 through a face-to-edge π-πstacking interaction (FIG. 12G), and the flipped U59 nucleobase hydrogenbonds with the side chain of Asn77 (FIG. 12H).

Stem loops 2 and 3, and the single-stranded linker are primarilyrecognized by the NUC lobe (FIG. 12A); this contrasts with stem loop 1and the guide:DNA/repeat:anti-repeat duplexes, which are recognized byboth of the NUC and REC lobes. The backbone phosphate groups of thelinker (nucleotides 63-65 and 67) interact with the RuvC domain (G1u57,Lys742, and Lys1097), PI domain (Thr1102), and Bridge helix (Arg69),with the 2′-hydroxyl groups of U64 and A65 hydrogen bonded with Glu57and His721, respectively (FIG. 12I). The nucleobase of C67 hydrogenbonds with the main-chain amide group of Val1100 (FIG. 12I).

Stem loop 2 is recognized by Cas9 via the interactions between the NUClobe and the non-Watson-Crick A68:G81 pair, which is formed by direct(between the A68 N6 and G81 O6 atoms) and water-mediated (between theA68 N1 and G81 N1 atoms) hydrogen-bonding interactions (FIG. 12J). Thenucleobases of A68 and G81 contact the side chains of Ser1351 andTyr1356, respectively, with the A68:G81 pair recognized by Thr1358 via awater-mediated hydrogen bond (FIG. 12J). The 2′-hydroxyl group of A68hydrogen bonds with the side chain of His1349, and the 2-amino group ofG81 hydrogen bonds with the main-chain carbonyl group of Lys33 (FIG.12J).

Stem loop 3 interacts with the NUC lobe more extensively relative tostem loop 2 (FIG. 12K). The backbone phosphate groups of C91 and G92interact with the RuvC domain (Arg40 and Lys44) (FIG. 12K), while thenucleobases of G89 and U90 hydrogen bond with Gln1272 andGlu1225/Ala1227, respectively (FIG. 12K). The nucleobases of A88 and C91are recognized by the side chain of Asn46 via multiple hydrogen-bondinginteractions (FIG. 12K).

Structural flexibility of Cas9 and sgRNA: Although the HNH domaincleaves the complementary strand of the target DNA at a position threenucleotides upstream of the PAM sequence (Gasiunas et al., 2012; Jineket al., 2012), in the present structure the HNH domain is positionedaway from the scissile phosphate group of the bound complementary strand(FIG. 13A). A structural comparison of Mol A and Mol B providedmechanistic insights into the complementary strand cleavage by the HNHdomain. In Mol A, the HNH domain is followed by the α40 helix of theRuvC domain, which is connected with the α41 helix by an α40-α41 linker(residues 919-925) (FIG. 13A). Whereas in Mol A residues 913-925 formthe C-terminal portion of the α43 helix and α43-α44 linker, in Mol Bthese residues form an extended α-helix, which is directed toward thecleavage site of the complementary strand (FIG. 13A). These observationssuggest that the HNH domain can approach and cleave the target DNAthrough conformational changes in the segment connecting the HNH andRuvC domains.

Moreover, the structural comparison revealed a conformationalflexibility between the REC and NUC lobes (FIG. 13B). Compared to Mol A,Mol B adopts a more open conformation, in which the two lobes arerotated by 15° at a hinge loop between Bridge helix and the strand β5 inthe RuvC domain (FIG. 13B). The bound sgRNA also undergoes anaccompanying conformational change at the single-stranded linker, whichinteracts with the hinge loop (FIG. 13C). Applicants also observed anaccompanying displacement of the β17-β18 loop of the PI domain, whichinteracts with the repeat:anti-repeat duplex and the α2-α3 loop of theREC1 domain (FIG. 13B). Notably, there is no direct contact between thetwo lobes in the present structure, except for the interactions betweenthe α2-α3 and β17-β18 loops (FIG. 13D), suggesting that Cas9 is highlyflexible in the absence of the sgRNA. The flexible nature of Cas9 islikely to play a role in the assembly of the Cas9-sgRNA-DNA ternarycomplex.

The crystal structure of Cas9 in complex with guide RNA and target DNAreveals that the 20-bp heteroduplex formed by the crRNA guide region andthe complementary strand of the target DNA is accommodated in thepositively-charged groove at the interface between the REC and NUC lobesof Cas9, with the scissile phosphate group of the target properlypositioned for cleavage by the HNH domain. Although the presentstructure does not contain the non-complementary DNA strand, theposition of the bound complementary strand suggests that the scissilephosphate of the non-complementary strand is located in the vicinity ofthe active site of the RuvC domain, consistent with previous biochemicaldata (Gasiunas et al., 2012; Jinek et al., 2012). Furthermore,Applicants' structural and functional analyses indicate that the PIdomain participates in the recognition of the PAM sequence of thenon-complementary strand.

Based on these observations, Applicants propose a model for theCas9-catalyzed RNA-guided DNA cleavage (FIG. 14). Cas9 recognizes thePAM-proximal guide region and repeat:anti-repeat duplex of sgRNA to forma Cas9-sgRNA binary complex. The binary complex subsequently recognizesthe DNA sequence complementary to the 20-nt guide region of the boundsgRNA, forming the final Cas9-sgRNA-target DNA ternary complex. Duringthe ternary complex formation, the PI domain recognizes the PAM sequenceof the non-complementary strand, facilitating the R-loop formation. Uponassembly of the ternary complex, the mobile HNH domain approaches andcleaves the complementary strand in the guide:DNA duplex, whereas theRuvC domain cleaves the single-stranded, non-complementary strand.

Applicants' crystal structure provides a critical step towardsunderstanding the molecular mechanism of RNA-guided DNA targeting byCas9. Further structural and functional studies with S. pyogenes Cas9 orrelated orthologs, including the structural determination of theCas9-sgRNA-DNA ternary complex containing the non-complementary strand,may be important for illuminating details such as Cas9-mediatedrecognition of PAM sequences on the target DNA or mismatch tolerancebetween the sgRNA:DNA duplex. However, the present structural andfunctional analyses already provide a useful scaffold for rationalengineering of Cas9-based genome modulating technologies. Applicantsreported, for example, an S. pyogenes Cas9 truncation mutant (FIG. 9B)that will facilitate packaging of Cas9 into size-constrained viralvectors for in vivo and therapeutic applications. Similarly, futureengineering of the PI domain allows for programming of PAM specificity,improving target site recognition fidelity, and increasing theversatility of the Cas9 genome engineering platform.

Experimental Procedures

Protein preparation: The gene encoding full-length S. pyogenes Cas9(residues 1-1368) was cloned between the NdeI and XhoI sites of themodified pCold-GST vector (TaKaRa). The protein was expressed at 20° C.in Escherichia coli Rosetta 2 (DE3) (Novagen), and was purified byNi-NTA Superflow resin (QIAGEN). The eluted protein was incubatedovernight at 4° C. with TEV protease to remove the GST-tag, and furtherpurified by chromatography on Ni-NTA, Mono S (GE Healthcare) and HiLoadSuperdex 200 16/60 (GE Healthcare) columns. The SeMet-labeled proteinwas prepared using a similar protocol for the native protein. The sgRNAwas in vitro transcribed by T7 polymerase using a PCR-amplifiedtemplate, and was purified on 10% denaturing polyacrylamide gelelectrophoresis. The target DNA was purchased from Sigma-Aldrich. Thepurified Cas9 protein was mixed with sgRNA and DNA (molar ratio1:1.5:2), and then the complex was purified using a Superdex 200Increase column (GE Healthcare) in a buffer containing 10 mM Tris-HCl,pH 8.0, 150 mM NaCl and 1 mM DTT.

Crystallography: The purified Cas9-sgRNA-DNA complex was crystallized at20° C. by the hanging-drop vapor diffusion method. Crystals wereobtained by mixing 1 μl of complex solution (A_(260 nm)=15) and 1 μl ofreservoir solution (12% PEG 3,350, 100 mM Tris-HCl, pH 8.0, 200 mMammonium acetate, 150 mM NaCl and 100 mM NDSB-256).The SeMet-labeledprotein was crystallized under conditions similar to those for thenative protein. X-ray diffraction data were collected at 100 K on thebeamlines BL32XU and BL41XU at SPring-8 (Hyogo, Japan). The crystalswere cryoprotected in reservoir solution supplemented with 25% ethyleneglycol. X-ray diffraction data were processed using XDS (Kabsch, 2010).The structure was determined by the SAD method, using the 2.8 Åresolution data from the SeMet-labeled crystal. Forty of the potential44 Se atoms were located using SHELXD (Sheldrick, 2008) and autoSHARP(delaFortelle and Bricogne, 1997). The initial phases were calculatedusing autoSHARP, and further improved by 2-fold NCS averaging using DM(Winn et al., 2011). The model was automatically built using PHENIXAutoSol (Adams et al., 2002), followed by manual model building usingCOOT (Emsley and Cowtan, 2004) and refinement using PHENIX (Adams etal., 2002). The resulting model was further refined using for native 2.4Å resolution data.

Cell culture and transfection: Human embryonic kidney (HEK) cell line293FT (Life Technologies) or mouse Neuro 2a (Sigma-Aldrich) cell linewas maintained in Dulbecco's modified Eagle's Medium (DMEM) supplementedwith 10% fetal bovine serum (HyClone), 2 mM GlutaMAX (LifeTechnologies), 100 U/ml penicillin, and 100 μg/ml streptomycin at 37° C.with 5% CO₂ incubation. Cells were seeded onto 24-well plates (Corning)at a density of 120,000 cells/well, 24 h prior to transfection. Cellswere transfected using Lipofectamine 2000 (Life Technologies) at 70-80%confluency following the manufacturer's recommended protocol. A total of400 ng Cas9 plasmid and 100 ng of U6::sgRNA PCR product was transfected.

SURVEYOR nuclease assay for genome modification: 293FT cells weretransfected with DNA as described above. Cells were incubated at 37° C.for 72 h post-transfection prior to genomic DNA extraction. Genomic DNAwas extracted using the QuickExtract DNA Extraction Solution (Epicentre)following the manufacturer's protocol. Briefly, pelleted cells wereresuspended in QuickExtract solution and incubated at 65° C. for 15 min,68° C. for 15 min, and 98° C. for 10 min.

The genomic region flanking the CRISPR target site for each gene was PCRamplified, and products were purified using QiaQuick Spin Column(Qiagen) following the manufacturer's protocol. 400 ng total of thepurified PCR products were mixed with 2 μl 10× Taq DNA Polymerase PCRbuffer (Enzymatics) and ultrapure water to a final volume of 20 μl, andsubjected to a re-annealing process to enable heteroduplex formation:95° C. for 10 min, 95° C. to 85° C. ramping at −2° C./s, 85° C. to 25°C. at −0.25° C./s, and 25° C. hold for 1 min. After re-annealing,products were treated with SURVEYOR nuclease and SURVEYOR enhancer S(Transgenomics) following the manufacturer's recommended protocol, andanalyzed on 4-20% Novex TBE poly-acrylamide gels (Life Technologies).Gels were stained with SYBR Gold DNA stain (Life Technologies) for 30min and imaged with a Gel Doc gel imaging system (Bio-rad).Quantification was based on relative band intensities. Indel percentagewas determined by the formula, 100×(1−(1−(b+c)/(a+b+c))^(1/2)), where ais the integrated intensity of the undigested PCR product, and b and care the integrated intensities of each cleavage product.

Western blot: HEK 293FT cells were transfected and lysed in 1× RIPAbuffer (Sigma-Aldrich) supplemented with Protease Inhibitor (Roche).Lysates were loaded onto Bolt 4-12% Bis-Tris Plus Gel (Invitrogen) andtransferred to nitrocellulose membranes. Membranes were blocked inTris-buffered saline containing 0.1% Tween-20 and 5% blocking agent(G-Biosciences). Membrane was probed with rabbit anti-FLAG (1:5000,Abcam), HRP-conjugated anti-GAPDH (1:5,000 Cell Signaling Technology),and HRP-conjugated anti-rabbit (1:1000). Blots were visualized on GelDoc XR+ System (Bio-rad).

Sequence Information:

Italic: 3XFLAG sequence Underlined: NLS sequences Wildtype SpCas9ATGGACTATAAGGACCACGACGGAGACTACAAGGATCATGATATTGATTACAAAGACGATGACGATAAGATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACCTGAACCCCGACAACAGCGACGTGGACAAGCTGTTCATCCAGCTGGTGCAGACCTACAACCAGCTGTTCGAGGAAAACCCCATCAACGCCAGCGGCGTGGACGCCAAGGCCATCCTGTCTGCCAGACTGAGCAAGAGCAGACGGCTGGAAAATCTGATCGCCCAGCTGCCCGGCGAGAAGAAGAATGGCCTGTTCGGCAACCTGATTGCCCTGAGCCTGGGCCTGACCCCCAACTTCAAGAGCAACTTCGACCTGGCCGAGGATGCCAAACTGCAGCTGAGCAAGGACACCTACGACGACGACCTGGACAACCTGCTGGCCCAGATCGGCGACCAGTACGCCGACCTGTTTCTGGCCGCCAAGAACCTGTCCGACGCCATCCTGCTGAGCGACATCCTGAGAGTGAACACCGAGATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAG (SEQ IDNO: 76) Sp_del(97-150)ATGGACTATAAGGACCACGACGGAGACTACAAGGATCATGATATTGATTACAAAGACGATGACGATAAGATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACCTGAACCCCGACAACAGCGACGTGGACAAGCTGTTCATCCAGCTGGTGCAGACCTACAACCAGCTGTTCGAGGAAAACCCCATCAACGCCAGCGGCGTGGACGCCAAGGCCATCCTGTCTGCCAGACTGAGCAAGAGCAGACGGCTGGAAAATCTGATCGCCCAGCTGCCCGGCGAGAAGAAGAATGGCCTGTTCGGAAACCTGATTGCCCTGAGCCTGGGCCTGACCCCCAACTTCAAGAGCAACTTCGACCTGGCCGAGGATGCCAAACTGCAGCTGAGCAAGGACACCTACGACGACGACCTGGACAACCTGCTGGCCCAGATCGGCGACCAGTACGCCGACCTGTTTCTGGCCGCCAAGAACCTGTCCGACGCCATCCTGCTGAGCGACATCCTGAGAGTGAACACCGAGATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAG (SEQ ID NO: 77) Sp_del(175-307)ATGGACTATAAGGACCACGACGGAGACTACAAGGATCATGATATTGATTACAAAGACGATGACGATAAGATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACCTGGTGAACACCGAGATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAG (SEQ ID NO: 78)Sp_del(312-409)ATGGACTATAAGGACCACGACGGAGACTACAAGGATCATGATATTGATTACAAAGACGATGACGATAAGATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACCTGAACCCCGACAACAGCGACGTGGACAAGCTGTTCATCCAGCTGGTGCAGACCTACAACCAGCTGTTCGAGGAAAACCCCATCAACGCCAGCGGCGTGGACGCCAAGGCCATCCTGTCTGCCAGACTGAGCAAGAGCAGACGGCTGGAAAATCTGATCGCCCAGCTGCCCGGCGAGAAGAAGAATGGCCTGTTCGGAAACCTGATTGCCCTGAGCCTGGGCCTGACCCCCAACTTCAAGAGCAACTTCGACCTGGCCGAGGATGCCAAACTGCAGCTGAGCAAGGACACCTACGACGACGACCTGGACAACCTGCTGGCCCAGATCGGCGACCAGTACGCCGACCTGTTTCTGGCCGCCAAGAACCTGTCCGACGCCATCCTGCTGAGCGACATCCTGAGAGTGAACACCGAGATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAG (SEQ ID NO: 79)Sp_del(1098-end)ATGGACTATAAGGACCACGACGGAGACTACAAGGATCATGATATTGATTACAAAGACGATGACGATAAGATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACCTGAACCCCGACAACAGCGACGTGGACAAGCTGTTCATCCAGCTGGTGCAGACCTACAACCAGCTGTTCGAGGAAAACCCCATCAACGCCAGCGGCGTGGACGCCAAGGCCATCCTGTCTGCCAGACTGAGCAAGAGCAGACGGCTGGAAAATCTGATCGCCCAGCTGCCCGGCGAGAAGAAGAATGGCCTGTTCGGAAACCTGATTGCCCTGAGCCTGGGCCTGACCCCCAACTTCAAGAGCAACTTCGACCTGGCCGAGGATGCCAAACTGCAGCTGAGCAAGGACACCTACGACGACGACCTGGACAACCTGCTGGCCCAGATCGGCGACCAGTACGCCGACCTGTTTCTGGCCGCCAAGAACCTGTCCGACGCCATCCTGCTGAGCGACATCCTGAGAGTGAACACCGAGATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAG (SEQ ID NO: 80) St3Cas9ATGGACTATAAGGACCACGACGGAGACTACAAGGATCATGATATTGATTACAAAGACGATGACGATAAGATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCACCAAGCCCTACAGCATCGGCCTGGACATCGGCACCAATAGCGTGGGCTGGGCCGTGACCACCGACAACTACAAGGTGCCCAGCAAGAAAATGAAGGTGCTGGGCAACACCTCCAAGAAGTACATCAAGAAAAACCTGCTGGGCGTGCTGCTGTTCGACAGCGGCATTACAGCCGAGGGCAGACGGCTGAAGAGAACCGCCAGACGGCGGTACACCCGGCGGAGAAACAGAATCCTGTATCTGCAAGAGATCTTCAGCACCGAGATGGCTACCCTGGACGACGCCTTCTTCCAGCGGCTGGACGACAGCTTCCTGGTGCCCGACGACAAGCGGGACAGCAAGTACCCCATCTTCGGCAACCTGGTGGAAGAGAAGGCCTACCACGACGAGTTCCCCACCATCTACCACCTGAGAAAGTACCTGGCCGACAGCACCAAGAAGGCCGACCTGAGACTGGTGTATCTGGCCCTGGCCCACATGATCAAGTACCGGGGCCACTTCCTGATCGAGGGCGAGTTCAACAGCAAGAACAACGACATCCAGAAGAACTTCCAGGACTTCCTGGACACCTACAACGCCATCTTCGAGAGCGACCTGTCCCTGGAAAACAGCAAGCAGCTGGAAGAGATCGTGAAGGACAAGATCAGCAAGCTGGAAAAGAAGGACCGCATCCTGAAGCTGTTCCCCGGCGAGAAGAACAGCGGAATCTTCAGCGAGTTTCTGAAGCTGATCGTGGGCAACCAGGCCGACTTCAGAAAGTGCTTCAACCTGGACGAGAAAGCCAGCCTGCACTTCAGCAAAGAGAGCTACGACGAGGACCTGGAAACCCTGCTGGGATATATCGGCGACGACTACAGCGACGTGTTCCTGAAGGCCAAGAAGCTGTACGACGCTATCCTGCTGAGCGGCTTCCTGACCGTGACCGACAACGAGACAGAGGCCCCACTGAGCAGCGCCATGATTAAGCGGTACAACGAGCACAAAGAGGATCTGGCTCTGCTGAAAGAGTACATCCGGAACATCAGCCTGAAAACCTACAATGAGGTGTTCAAGGACGACACCAAGAACGGCTACGCCGGCTACATCGACGGCAAGACCAACCAGGAAGAGGAAGATTTCTATGTGTACCTGAAGAAGCTGCTGGCCGAGTTCGAGGGGGCCGACTACTTTCTGGAAAAAATCGACCGCGAGGATTTCCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCTACCAGATCCATCTGCAGGAAATGCGGGCCATCCTGGACAAGCAGGCCAAGTTCTACCCATTCCTGGCCAAGAACAAAGAGCGGATCGAGAAGATCCTGACCTTCCGCATCCCTTACTACGTGGGCCCCCTGGCCAGAGGCAACAGCGATTTTGCCTGGTCCATCCGGAAGCGCAATGAGAAGATCACCCCCTGGAACTTCGAGGACGTGATCGACAAAGAGTCCAGCGCCGAGGCCTTCATCAACCGGATGACCAGCTTCGACCTGTACCTGCCCGAGGAAAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGACATTCAATGTGTATAACGAGCTGACCAAAGTGCGGTTTATCGCCGAGTCTATGCGGGACTACCAGTTCCTGGACTCCAAGCAGAAAAAGGACATCGTGCGGCTGTACTTCAAGGACAAGCGGAAAGTGACCGATAAGGACATCATCGAGTACCTGCACGCCATCTACGGCTACGATGGCATCGAGCTGAAGGGCATCGAGAAGCAGTTCAACTCCAGCCTGAGCACATACCACGACCTGCTGAACATTATCAACGACAAAGAATTTCTGGACGACTCCAGCAACGAGGCCATCATCGAAGAGATCATCCACACCCTGACCATCTTTGAGGACCGCGAGATGATCAAGCAGCGGCTGAGCAAGTTCGAGAACATCTTCGACAAGAGCGTGCTGAAAAAGCTGAGCAGACGGCACTACACCGGCTGGGGCAAGCTGAGCGCCAAGCTGATCAACGGCATCCGGGACGAGAAGTCCGGCAACACAATCCTGGACTACCTGATCGACGACGGCATCAGCAACCGGAACTTCATGCAGCTGATCCACGACGACGCCCTGAGCTTCAAGAAGAAGATCCAGAAGGCCCAGATCATCGGGGACGAGGACAAGGGCAACATCAAAGAAGTCGTGAAGTCCCTGCCCGGCAGCCCCGCCATCAAGAAGGGAATCCTGCAGAGCATCAAGATCGTGGACGAGCTCGTGAAAGTGATGGGCGGCAGAAAGCCCGAGAGCATCGTGGTGGTGGTGGAAATGGCTAGAGAGAACCAGTACACCAATCAGGGCAAGAGCAACAGCCAGCAGAGACTGAAGAGACTGGAAAAGTCCCTGAAAGAGCTGGGCAGCAAGATTCTGAAAGAGAATATCCCTGCCAAGCTGTCCAAGATCGACAACAACGCCCTGCAGAACGACCGGCTGTACCTGTACTACCTGCAGAATGGCAAGGACATGTATACAGGCGACGACCTGGATATCGACCGCCTGAGCAACTACGACATCGACCATATTATCCCCCAGGCCTTCCTGAAAGACAACAGCATTGACAACAAAGTGCTGGTGTCCTCCGCCAGCAACCGCGGCAAGTCCGATGATGTGCCCAGCCTGGAAGTCGTGAAAAAGAGAAAGACCTTCTGGTATCAGCTGCTGAAAAGCAAGCTGATTAGCCAGAGGAAGTTCGACAACCTGACCAAGGCCGAGAGAGGCGGCCTGAGCCCTGAAGATAAGGCCGGCTTCATCCAGAGACAGCTGGTGGAAACCCGGCAGATCACCAAGCACGTGGCCAGACTGCTGGATGAGAAGTTTAACAACAAGAAGGACGAGAACAACCGGGCCGTGCGGACCGTGAAGATCATCACCCTGAAGTCCACCCTGGTGTCCCAGTTCCGGAAGGACTTCGAGCTGTATAAAGTGCGCGAGATCAATGACTTTCACCACGCCCACGACGCCTACCTGAATGCCGTGGTGGCTTCCGCCCTGCTGAAGAAGTACCCTAAGCTGGAACCCGAGTTCGTGTACGGCGACTACCCCAAGTACAACTCCTTCAGAGAGCGGAAGTCCGCCACCGAGAAGGTGTACTTCTACTCCAACATCATGAATATCTTTAAGAAGTCCATCTCCCTGGCCGATGGCAGAGTGATCGAGCGGCCCCTGATCGAAGTGAACGAAGAGACAGGCGAGAGCGTGTGGAACAAAGAAAGCGACCTGGCCACCGTGCGGCGGGTGCTGAGTTATCCTCAAGTGAATGTCGTGAAGAAGGTGGAAGAACAGAACCACGGCCTGGATCGGGGCAAGCCCAAGGGCCTGTTCAACGCCAACCTGTCCAGCAAGCCTAAGCCCAACTCCAACGAGAATCTCGTGGGGGCCAAAGAGTACCTGGACCCTAAGAAGTACGGGTACGGCGGATACGCCGGCATCTCCAATAGCTTCACCGTGCTCGTGAAGGGCACAATCGAGAAGGGCGCTAAGAAAAAGATCACAAACGTGCTGGAATTTCAGGGGATCTCTATCCTGGACCGGATCAACTACCGGAAGGATAAGCTGAACTTTCTGCTGGAAAAAGGCTACAAGGACATTGAGCTGATTATCGAGCTGCCTAAGTACTCCCTGTTCGAACTGAGCGACGGCTCCAGACGGATGCTGGCCTCCATCCTGTCCACCAACAACAAGCGGGGCGAGATCCACAAGGGAAACCAGATCTTCCTGAGCCAGAAATTTGTGAAACTGCTGTACCACGCCAAGCGGATCTCCAACACCATCAATGAGAACCACCGGAAATACGTGGAAAACCACAAGAAAGAGTTTGAGGAACTGTTCTACTACATCCTGGAGTTCAACGAGAACTATGTGGGAGCCAAGAAGAACGGCAAACTGCTGAACTCCGCCTTCCAGAGCTGGCAGAACCACAGCATCGACGAGCTGTGCAGCTCCTTCATCGGCCCTACCGGCAGCGAGCGGAAGGGACTGTTTGAGCTGACCTCCAGAGGCTCTGCCGCCGACTTTGAGTTCCTGGGAGTGAAGATCCCCCGGTACAGAGACTACACCCCCTCTAGTCTGCTGAAGGACGCCACCCTGATCCACCAGAGCGTGACCGGCCTGTACGAAACCCGGATCGACCTGGCTAAGCTGGGCGAGGGAAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAG (SEQ ID NO: 81)SpCas9(C80L, C574A)ATGGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCctgTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACCTGAACCCCGACAACAGCGACGTGGACAAGCTGTTCATCCAGCTGGTGCAGACCTACAACCAGCTGTTCGAGGAAAACCCCATCAACGCCAGCGGCGTGGACGCCAAGGCCATCCTGTCTGCCAGACTGAGCAAGAGCAGACGGCTGGAAAATCTGATCGCCCAGCTGCCCGGCGAGAAGAAGAATGGCCTGTTCGGAAACCTGATTGCCCTGAGCCTGGGCCTGACCCCCAACTTCAAGAGCAACTTCGACCTGGCCGAGGATGCCAAACTGCAGCTGAGCAAGGACACCTACGACGACGACCTGGACAACCTGCTGGCCCAGATCGGCGACCAGTACGCCGACCTGTTTCTGGCCGCCAAGAACCTGTCCGACGCCATCCTGCTGAGCGACATCCTGAGAGTGAACACCGAGATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGgagTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGAC (SEQ ID NO: 82) Sp_St3 Cas9 chimera (St3 in bold)ATGGACTATAAGGACCACGACGGAGACTACAAGGATCATGATATTGATTACAAAGACGATGACGATAAGATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACCTGAACCCCGACAACAGCGACGTGGACAAGCTGTTCATCCAGCTGGTGCAGACCTACAACCAGCTGTTCGAGGAAAACCCCATCAACGCCAGCGGCGTGGACGCCAAGGCCATCCTGTCTGCCAGACTGAGCAAGAGCAGACGGCTGGAAAATCTGATCGCCCAGCTGCCCGGCGAGAAGAAGAATGGCCTGTTCGGAAACCTGATTGCCCTGAGCCTGGGCCTGACCCCCAACTTCAAGAGCAACTTCGACCTGGCCGAGGATGCCAAACTGCAGCTGAGCAAGGACACCTACGACGACGACCTGGACAACCTGCTGGCCCAGATCGGCGACCAGTACGCCGACCTGTTTCTGGCCGCCAAGAACCTGTCCGACGCCATCCTGCTGAGCGACATCCTGAGAGTGAACACCGAGATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGGGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGTGGAAGAACAGAACCACGGCCTGGATCGGGGCAAGCCCAAGGGCCTGTTCAACGCCAACCTGTCCAGCAAGCCTAAGCCCAACTCCAACGAGAATCTCGTGGGGGCCAAAGAGTACCTGGACCCTAAGAAGTACGGGTACGGCGGATACGCCGGCATCTCCAATAGCTTCACCGTGCTCGTGAAGGGCACAATCGAGAAGGGCGCTAAGAAAAAGATCACAAACGTGCTGGAATTTCAGGGGATCTCTATCCTGGACCGGATCAACTACCGGAAGGATAAGCTGAACTTTCTGCTGGAAAAAGGCTACAAGGACATTGAGCTGATTATCGAGCTGCCTAAGTACTCCCTGTTCGAACTGAGCGACGGCTCCAGACGGATGCTGGCCTCCATCCTGTCCACCAACAACAAGCGGGGCGAGATCCACAAGGGAAACCAGATCTTCCTGAGCCAGAAATTTGTGAAACTGCTGTACCACGCCAAGCGGATCTCCAACACCATCAATGAGAACCACCGGAAATACGTGGAAAACCACAAGAAAGAGTTTGAGGAACTGTTCTACTACATCCTGGAGTTCAACGAGAACTATGTGGGAGCCAAGAAGAACGGCAAACTGCTGAACTCCGCCTTCCAGAGCTGGCAGAACCACAGCATCGACGAGCTGTGCAGCTCCTTCATCGGCCCTACCGGCAGCGAGCGGAAGGGACTGTTTGAGCTGACCTCCAGAGGCTCTGCCGCCGACTTTGAGTTCCTGGGAGTGAAGATCCCCCGGTACAGAGACTACACCCCCTCTAGTCTGCTGAAGGACGCCACCCTGATCCACCAGAGCGTGACCGGCCTGTACGAAACCCGGATCGACCTGGCTAAGCTGGGCGAGGGAAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAG (SEQ ID NO: 83)St3_Sp Cas9 chimera (St3 in bold)ATGGCCCCAAAGAAGAAGGGGAAGGTGGGTATCCACGGAGTGCCAGCAGCCatgACCAGGCCTACAGCATGGGCCTGGACATGGGCACCAATAGGGTGGGCTGGGCCGTGACCACCGACAACTACAAGGTGCCCAGCAAGAAAATGAAGGTGCTGGGCAACACCTCCAAGAAGTACATCAAGAAAAACCTGCTGGGCGTGCTGCTGTTCGACAGGGGCATTACAGCCGAGGGCAGACGGCTGAAGAGAACCGCCAGACGGCGGTACACCCGGCGGAGAAACAGAATGCTGTATCTGCAAGAGATCTTCAGCACCGAGATGGCTACCCTGGACGACGCCTTCTTCCAGCGGCTGGACGACAGCTTCCTGGTGGCCGACGACAAGGGGGACAGCAAGTACCGCATCTTCGGCAACCTGGTGGAAGAGAAGGCCTACCACGACGAGTTCCGCACCATCTACCACGTGAGAAAGTACCTGGCCGACAGCACCAAGAAGGCCGACCTGAGACTGGTGTATCTGGCCGTGGCCCACATGATCAAGTACCGGGGCCACTTGCTGATCGAGGGCGAGTTCAACAGCAAGAACAACGACATCCAGAAGAACTTCCAGGACTTCCTGGACACCTACAACGCCATCTTCGAGAGCGACCTGTGCCTGGAAAACAGCAAGCAGCTGGAAGAGATCGTGAAGGACAAGATCAGCAAGCTGGAAAAGAAGGACCGCATCCTGAAGCTGTTGCCGGGCGAGAAGAACAGGGGAATCTTCAGCGAGTTTCTGAAGCTGATCGTGGGCAACCAGGCCGACTTCAGAAAGTGCTTCAACCTGGACGAGAAAGGCAGGCTGCACTTCAGCAAAGAGAGCTACGACGAGGACCTGGAAACCCTGCTGGGATATATCGGCGACGACTACAGCGACGTGTTCCTGAAGGCCAAGAAGCTGTACGACGCTATCCTGCTGAGCGGCTTCCTGACCGTGACCGACAACGAGACAGAGGCCCCACTGAGCAGCGCCATGATTAAGCGGTACAACGAGCACAAAGAGGATCTGGCTCTGCTGAAAGAGTACATCCGGAACATCAGCCTGAAAACCTACAATGAGGTGTTCAAGGACGACACCAAGAACGGCTACGCCGGCTACATCGACGGCAAGACCAACCAGGAAGAGGAAGATTTCTATGTGTACCTGAAGAAGCTGCTGGCCGAGTTCGAGGGGGCCGACTACTTTCTGGAAAAAATCGACCGCGAGGATTTCCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCTACCAGATCCATCTGCAGGAAATGCGGGCCATCCTGGACAAGCAGGCCAAGTTCTACCCATTCCTGGCCAAGAACAAAGAGCGGATCGAGAAGATCCTGACCTTCCGCATCCCTTACTACGTGGGCCCCCTGGCCAGAGGCAACAGCGATTTTGCCTGGTCCATCCGGAAGCGCAATGAGAAGATCACCCCCTGGAACTTCGAGGACGTGATCGACAAAGAGTCCAGCGCCGAGGCCTTCATCAACCGGATGACCAGCTTCGACCTGTACCTGCCCGAGGAAAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGACATTCAATGTGTATAACGAGCTGACCAAAGTGCGGTTTATCGCCGAGTCTATGCGGGACTACCAGTTCCTGGACTCCAAGCAGAAAAAGGACATCGTGCGGCTGTACTTCAAGGACAAGCGGAAAGTGACCGATAAGGACATCATCGAGTACCTGCACGCCATCTACGGCTACGATGGCATCGAGCTGAAGGGCATCGAGAAGCAGTTCAACTCCAGCCTGAGCACATACCACGACCTGCTGAACATTATCAACGACAAAGAATTTCTGGACGACTCCAGCAACGAGGCCATCATCGAAGAGATCATCCACACCCTGACCATCTTTGAGGACCGCGAGATGATCAAGCAGCGGCTGAGCAAGTTCGAGAACATCTTCGACAAGAGCGTGCTGAAAAAGCTGAGCAGACGGCACTACACCGGCTGGGGCAAGCTGAGCGCCAAGCTGATCAACGGCATCCGGGACGAGAAGTCCGGCAACACAATCCTGGACTACCTGATCGACGACGGCATCAGGAACCGGAACTTCATGCAGCTGATCCACGACGACGCCGTGAGCTTCAAGAAGAAGATCCAGAAGGCCCAGATCATCGGGGACGAGGACAAGGGCAACATCAAAGAAGTGGTGAAGTGCCTGCCGGGCAGCCGCGCCATCAAGAAGGGAATCCTGCAGAGCATCAAGATCGTGGACGAGCTCGTGAAAGTGATGGGCGGCAGAAAGCCCGAGAGCATCGTGGTGGTGGTGGAAATGGCTAGAGAGAACCAGTACACCAATCAGGGCAAGAGCAACAGGCAGGAGAGACTGAAGAGACTGGAAAAGTGCCTGAAAGAGCTGGGCAGCAAGATTCTGAAAGAGAATATCCCTGCCAAGGTGTGCAAGATGGACAACAACGGCCTGCAGAACGACCGGCTGTACCTGTACTACCTGCAGAATGGCAAGGACATGTATACAGGCGACGACCTGGATATCGACCGGCTGAGGAACTACGACATCGACCATATTATCCGCCAGGCCTTCCTGAAAGACAACAGCATTGACAACAAAGTGCTGGTGTCCTCCGCCAGCAACCGGGGCAAGTCCGATGATGTGCCCAGCCTGGAAGTGGTGAAAAAGAGAAAGACCTTCTGGTATCAGGTGGTGAAAAGCAAGCTGATTAGCCAGAGGAAGTTCGACAACCTGACCAAGGCCGAGAGAGGCGGCCTGAGCCCTGAAGATAAGGCCGGCTTCATCCAGAGACAGCTGGTGGAAACCGGGCAGATCACCAAGCACGTGGCCAGACTGCTGGATGAGAAGTTTAACAACAAGAAGGACGAGAACAACCGGGCCGTGCGGACCGTGAAGATCATCACCCTGAAGTCCACCCTGGTGTCCCAGTTCCGGAAGGACTTCGAGCTGTATAAAGTGCGCGAGATCAATGACTTTCACCACGCCCACGACGGCTACCTGAATGCCGTGGTGGCTTCCGCCCTGCTGAAGAAGTACCCTAAGCTGGAACCCGAGTTCGTGTACGGCGACTACCCCAAGTACAACTGGTTCAGAGAGGGGAAGTCCGCCACCGAGAAGGTGTACTTCTACTCCAACATCATGAATATCTTTAAGAAGTCCATCTCCCTGGCCGATGGCAGAGTGATCGAGGGGCCGCTGATCGAAGTGAACGAAGAGACAGGCGAGAGCGTGTGGAACAAAGAAAGCGACCTGGCCACCGTGCGGGGGGTGGTGAGTTATGGTGAAGTGAATGTGGTGAAGAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAG (SEQ ID NO: 84)SpCas9 nickasesMutated residues (changed to GCC) bolded in order: D10, E762, N863, H983, D986ATGGACTATAAGGACCACGACGGAGACTACAAGGATCATGATATTGATTACAAAGACGATGACGATAAGATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACCTGAACCCCGACAACAGCGACGTGGACAAGCTGTTCATCCAGCTGGTGCAGACCTACAACCAGCTGTTCGAGGAAAACCCCATCAACGCCAGCGGCGTGGACGCCAAGGCCATCCTGTCTGCCAGACTGAGCAAGAGCAGACGGCTGGAAAATCTGATCGCCCAGCTGCCCGGCGAGAAGAAGAATGGCCTGTTCGGCAACCTGATTGCCCTGAGCCTGGGCCTGACCCCCAACTTCAAGAGCAACTTCGACCTGGCCGAGGATGCCAAACTGCAGCTGAGCAAGGACACCTACGACGACGACCTGGACAACCTGCTGGCCCAGATCGGCGACCAGTACGCCGACCTGTTTCTGGCCGCCAAGAACCTGTCCGACGCCATCCTGCTGAGCGACATCCTGAGAGTGAACACCGAGATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAG (SEQ ID NO: 85)SpCas9 point mutantsMutated residues (changed to GCC) bolded in order: R63A, R66A, R69A, R70A, R74A, R75A, R78A, K163A, R165A, K510AATGGACTATAAGGACCACGACGGAGACTACAAGGATCATGATATTGATTACAAAGACGATGACGATAAGATGGCCCCAAAGAAGAAGCGGAAGGTCGGTATCCACGGAGTCCCAGCAGCCGACAAGAAGTACAGCATCGGCCTGGACATCGGCACCAACTCTGTGGGCTGGGCCGTGATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCGACCGGCACAGCATCAAGAAGAACCTGATCGGAGCCCTGCTGTTCGACAGCGGCGAAACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAAGAAGAGATACACCAGACGGAAGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGACGACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCACGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGTACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGACCTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCTGATCGAGGGCGACCTGAACCCCGACAACAGCGACGTGGACAAGCTGTTCATCCAGCTGGTGCAGACCTACAACCAGCTGTTCGAGGAAAACCCCATCAACGCCAGCGGCGTGGACGCCAAGGCCATCCTGTCTGCCAGACTGAGCAAGAGCAGACGGCTGGAAAATCTGATCGCCCAGCTGCCCGGCGAGAAGAAGAATGGCCTGTTCGGCAACCTGATTGCCCTGAGCCTGGGCCTGACCCCCAACTTCAAGAGCAACTTCGACCTGGCCGAGGATGCCAAACTGCAGCTGAGCAAGGACACCTACGACGACGACCTGGACAACCTGCTGGCCCAGATCGGCGACCAGTACGCCGACCTGTTTCTGGCCGCCAAGAACCTGTCCGACGCCATCCTGCTGAGCGACATCCTGAGAGTGAACACCGAGATCACCAAGGCCCCCCTGAGCGCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGCTCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCAAGAACGGCTACGCCGGCTACATTGACGGCGGAGCCAGCCAGGAAGAGTTCTACAAGTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGCTGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCCCCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTTACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATCCCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAGAAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGCGCTTCCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAACGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTATAACGAGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGCGGCGAGCAGAAAAAGGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGACCGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGGAAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTGCTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCTGGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAACGGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGGCGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGGACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAACAGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCAGAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGGCCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGAGCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCAGAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCGGATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTGGAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCGGGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGGACCATATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGACAACAAGGTGCTGACCAGAAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCGTGAAGAAGATGAAGAACTACTGGCGGCAGCTGCTGAACGCCAAGCTGATTACCCAGAGAAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATAAGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTGGCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAATGACAAGCTGATCCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGGATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCCTACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAGCGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGAGCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATGAACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCTGATCGAGACAAACGGCGAAACCGGGGAGATCGTGTGGGATAAGGGCCGGGATTTTGCCACCGTGCGGAAAGTGCTGAGCATGCCCCAAGTGAATATCGTGAAAAAGACCGAGGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGATAAGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAGCCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCAAGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGCTTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAAGGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGAAGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCCCTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCTCCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCACTACCTGGACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGCTAATCTGGACAAAGTGCTGTCCGCCTACAACAAGCACCGGGATAAGCCCATCAGAGAGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCCGCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGAGGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGATCGACCTGTCTCAGCTGGGAGGCGACAAAAGGCCGGCGGCCACGAAAAAGGCCGGCCAGGCAAAAAAGAAAAAG (SEQ ID NO: 86)sgRNA sequences: guide sequence underlined +83GAGUCCGAGCAGAAGAAGAAGCCCCAGAGCUAGAAAUAGCAAGUUGGGGUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU(SEQ ID NO: 87) +47GAGUCCGAGCAGAAGAAGAAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUUU (SEQ ID NO: 88)+67GAGUCCGAGCAGAAGAAGAAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAGUGUUUU (SEQ ID NO: 89)mutate proximal crRNA: tracrRNA duplexGAGUCCGAGCAGAAGAAGAAGCCCCAGAGCUAGAAAUAGCAAGUUGGGGUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUU(SEQ ID NO: 90) truncate distal crRNA: tracrRNA duplexGAGUCCGAGCAGAAGAAGAAGUUUUAGAGACAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU (SEQ ID NO:91) remove crRNA: tracrRNA duplex bulgeGAGUCCGAGCAGAAGAAGAAGUUUUAGAGCUAGAAAUAGCUUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU (SEQID NO: 92) abolish stemloop 1GAGUCCGAGCAGAAGAAGAAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAUUCUAGUAAGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU(SEQ ID NO: 93) mutate stemloop 1GAGUCCGAGCAGAAGAAGAAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGCCAUGUGCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU(SEQ ID NO: 94) truncate linkerGAGUCCGAGCAGAAGAAGAAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU (SEQ IDNO: 95) replace stempllo 2GAGUCCGAGCAGAAGAAGAAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCACGCCGAAAGGCGGGCACCGAGUCGGUGCUUUU(SEQ ID NO: 96) lengthen stemloop 2GAGUCCGAGCAGAAGAAGAAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAGAAAUCAAGUGGCACCGAGUCGGUGCUUU(SEQ ID NO: 97) mutate stemloop 3GAGUCCGAGCAGAAGAAGAAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCCCGCGGCGGGGCUUUU(SEQ ID NO: 98) lengthen stemloop 3GAGUCCGAGCAGAAGAAGAAGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAAAGUUUCGGUGCUUUU(SEQ ID NO: 99) reconstructed sgRNAGAGUCCGAGCAGAAGAAGAAGCCCAGAGCAUUAGCAAGUUGGGGUAAGCCAUGUGCGUUAUCAGGGCACCAGCCCGGCACCGAGUCGGUGCUUUU (SEQ IDDNO: 100) G43AGAGUCCGAGCAGAAGAAGAAGUUUUAGAGCUAGAAAUAGCAACUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU(SEQ ID NO: 101) U44GGAGUCCGAGCAGAAGAAGAAGUUUUAGAGCUAGAAAUAGCAAGGUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU(SEQ ID NO: 102) U44CGAGUCCGAGCAGAAGAAGAAGUUUUAGAGCUAGAAAUAGCAAGCUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU(SEQ ID NO: 103)Primers (SEQ ID NOS 104-107, respectively, in order of appearance) Cas9Target PAM SURVEYOR primer F SURVERYOR R Sp GAGTCCGAGCAGAAGAAGAA GGGCCATCCCCTTCTGTGAATGT GGAGATTGGAGACACGGAGA St3 GCTCCCATCACATCAACCGG TGGCGsame same

While preferred embodiments of the present invention have been shown anddescribed herein, it will be obvious to those skilled in the art thatsuch embodiments are provided by way of example only. Numerousvariations, changes, and substitutions will now occur to those skilledin the art without departing from the invention. It should be understoodthat various alternatives to the embodiments of the invention describedherein may be employed in practicing the invention. It is intended thatthe following claims define the scope of the invention and that methodsand structures within the scope of these claims and their equivalents becovered thereby.

REFERENCES

-   1. Urnov, F. D., Rebar, E. J., Holmes, M. C., Zhang, H. S. &    Gregory, P. D. Genome editing with engineered zinc finger nucleases.    Nat. Rev. Genet. 11, 636-646 (2010).-   2. Bogdanove, A. J. & Voytas, D. F. TAL effectors: customizable    proteins for DNA targeting. Science 333, 1843-1846 (2011).-   3. Stoddard, B. L. Homing endonuclease structure and function. Q.    Rev. Biophys. 38, 49-95 (2005).-   4. Bae, T. & Schneewind, O. Allelic replacement in Staphylococcus    aureus with inducible counter-selection. Plasmid 55, 58-63 (2006).-   5. Sung, C. K., Li, H., Claverys, J. P. & Morrison, D. A. An rpsL    cassette, janus, for gene replacement through negative selection in    Streptococcus pneumoniae. Appl. Environ. Microbiol. 67, 5190-5196    (2001).-   6. Sharan, S. K., Thomason, L. C., Kuznetsov, S. G. & Court, D. L.    Recombineering: a homologous recombination-based method of genetic    engineering. Nat. Protoc. 4, 206-223 (2009).-   7. Jinek, M. et al. A programmable dual-RNA-guided DNA endonuclease    in adaptive bacterial immunity. Science 337, 816-821 (2012).-   8. Deveau, H., Garneau, J. E. & Moineau, S. CRISPR/Cas system and    its role in phage-bacteria interactions. Annu. Rev. Microbiol. 64,    475-493 (2010).-   9. Horvath, P. & Barrangou, R. CRISPR/Cas, the immune system of    bacteria and archaea. Science 327, 167-170 (2010).-   10. Terns, M. P. & Terns, R. M. CRISPR-based adaptive immune    systems. Curr. Opin. Microbiol. 14, 321-327 (2011).-   11. van der Oost, J., Jore, M. M., Westra, E. R., Lundgren, M. &    Brouns, S. J. CRISPR-based adaptive and heritable immunity in    prokaryotes. Trends. Biochem. Sci. 34, 401-407 (2009).-   12. Brouns, S. J. et al. Small CRISPR RNAs guide antiviral defense    in prokaryotes. Science 321, 960-964 (2008).-   13. Carte, J., Wang, R., Li, H., Terns, R. M. & Terns, M. P. Cas6 is    an endoribonuclease that generates guide RNAs for invader defense in    prokaryotes. Genes Dev. 22, 3489-3496 (2008).-   14. Deltcheva, E. et al. CRISPR RNA maturation by trans-encoded    small RNA and host factor RNase III. Nature 471, 602-607 (2011).-   15. Hatoum-Aslan, A., Maniv, I. & Marraffini, L. A. Mature    clustered, regularly interspaced, short palindromic repeats RNA    (crRNA) length is measured by a ruler mechanism anchored at the    precursor processing site. Proc. Natl. Acad. Sci. U.S.A. 108,    21218-21222 (2011).-   16. Haurwitz, R. E., Jinek, M., Wiedenheft, B., Zhou, K. &    Doudna, J. A. Sequence- and structure-specific RNA processing by a    CRISPR endonuclease. Science 329, 1355-1358 (2010).-   17. Deveau, H. et al. Phage response to CRISPR-encoded resistance in    Streptococcus thermophilus. J. Bacteriol. 190, 1390-1400 (2008).-   18. Gasiunas, G., Barrangou, R., Horvath, P. & Siksnys, V.    Cas9-crRNA ribonucleoprotein complex mediates specific DNA cleavage    for adaptive immunity in bacteria. Proc. Natl. Acad. Sci. U.S.A.    (2012).-   19. Makarova, K. S., Aravind, L., Wolf, Y. I. & Koonin, E. V.    Unification of Cas protein families and a simple scenario for the    origin and evolution of CRISPR-Cas systems. Biol. Direct. 6, 38    (2011).-   20. Barrangou, R. RNA-mediated programmable DNA cleavage. Nat.    Biotechnol. 30, 836-838 (2012).-   21. Brouns, S. J. Molecular biology. A Swiss army knife of immunity.    Science 337, 808-809 (2012).-   22. Carroll, D. A CRISPR Approach to Gene Targeting. Mol. Ther. 20,    1658-1660 (2012).-   23. Bikard, D., Hatoum-Aslan, A., Mucida, D. & Marraffini, L. A.    CRISPR interference can prevent natural transformation and virulence    acquisition during in vivo bacterial infection. Cell Host Microbe    12, 177-186 (2012).-   24. Sapranauskas, R. et al. The Streptococcus thermophilus    CRISPR/Cas system provides immunity in Escherichia coli. Nucleic    Acids Res. (2011).-   25. Semenova, E. et al. Interference by clustered regularly    interspaced short palindromic repeat (CRISPR) RNA is governed by a    seed sequence. Proc. Natl. Acad. Sci. U.S.A. (2011).-   26. Wiedenheft, B. et al. RNA-guided complex from a bacterial immune    system enhances target recognition through seed sequence    interactions. Proc. Natl. Acad. Sci. U.S.A. (2011).-   27. Zahner, D. & Hakenbeck, R. The Streptococcus pneumoniae    beta-galactosidase is a surface protein. J. Bacteriol. 182,    5919-5921 (2000).-   28. Marraffini, L. A., Dedent, A. C. & Schneewind, O. Sortases and    the art of anchoring proteins to the envelopes of gram-positive    bacteria. Microbiol. Mol. Biol. Rev. 70, 192-221 (2006).-   29. Motamedi, M. R., Szigety, S. K. & Rosenberg, S. M.    Double-strand-break repair recombination in Escherichia coli:    physical evidence for a DNA replication mechanism in vivo. Genes    Dev. 13, 2889-2903 (1999).-   30. Hosaka, T. et al. The novel mutation K87E in ribosomal protein    S12 enhances protein synthesis activity during the late growth phase    in Escherichia coli. Mol. Genet. Genomics 271, 317-324 (2004).-   31. Costantino, N. & Court, D. L. Enhanced levels of lambda    Red-mediated recombinants in mismatch repair mutants. Proc. Natl.    Acad. Sci. U.S.A. 100, 15748-15753 (2003).-   32. Edgar, R. & Qimron, U. The Escherichia coli CRISPR system    protects from lambda lysogenization, lysogens, and prophage    induction. J. Bacteriol. 192, 6291-6294 (2010).-   33. Marraffini, L. A. & Sontheimer, E. J. Self versus non-self    discrimination during CRISPR RNA-directed immunity. Nature 463,    568-571 (2010).-   34. Fischer, S. et al. An archaeal immune system can detect multiple    Protospacer Adjacent Motifs (PAMs) to target invader DNA. J. Biol.    Chem. 287, 33351-33363 (2012).-   35. Gudbergsdottir, S. et al. Dynamic properties of the Sulfolobus    CRISPR/Cas and CRISPR/Cmr systems when challenged with vector-borne    viral and plasmid genes and protospacers. Mol. Microbiol. 79, 35-49    (2011).-   36. Wang, H. H. et al. Genome-scale promoter engineering by    coselection MAGE. Nat Methods 9, 591-593 (2012).-   37. Cong L, Ran F A, Cox D, Lin S, Barretto R, Habib N, Hsu P D, Wu    X, Jiang W, Marraffini L A, Zhang F. Multiplex Genome Engineering    Using CRISPR/Cas Systems. Science. 2013 Feb. 15; 339(6121):819-23.-   38. Mali, P., Yang, L., Esvelt, K. M., Aach, J., Guell, M.,    DiCarlo, J. E., Norville, J. E., and Church, G M. (2013b).    RNA-guided human genome engineering via Cas9. Science 339, 823-826.-   39. Hoskins, J. et al. Genome of the bacterium Streptococcus    pneumoniae strain R6. J. Bacteriol. 183, 5709-5717 (2001).-   40. Havarstein, L. S., Coomaraswamy, G. & Morrison, D. A. An    unmodified heptadecapeptide pheromone induces competence for genetic    transformation in Streptococcus pneumoniae. Proc. Natl. Acad. Sci.    U.S.A. 92, 11140-11144 (1995).-   41. Horinouchi, S. & Weisblum, B. Nucleotide sequence and functional    map of pC194, a plasmid that specifies inducible chloramphenicol    resistance. J. Bacteriol. 150, 815-825 (1982).-   42. Horton, R. M. In Vitro Recombination and Mutagenesis of DNA:    SOEing Together Tailor-Made Genes. Methods Mol. Biol. 15, 251-261    (1993).-   43. Podbielski, A., Spellerberg, B., Woischnik, M., Pohl, B. &    Lutticken, R. Novel series of plasmid vectors for gene inactivation    and expression analysis in group A streptococci (GAS). Gene 177,    137-147 (1996).-   44. Husmann, L. K., Scott, J. R., Lindahl, G. & Stenberg, L.    Expression of the Arp protein, a member of the M protein family, is    not sufficient to inhibit phagocytosis of Streptococcus pyogenes.    Infection and immunity 63, 345-348 (1995).-   45. Gibson, D. G. et al. Enzymatic assembly of DNA molecules up to    several hundred kilobases. Nat Methods 6, 343-345 (2009).-   46. Garneau J. E. et al. The CRISPR/Cas bacterial immune system    cleaves bacteriophage and plasmid DNA. Nature 468,67-71(04 November    2010)-   47. Barrangou R. et al. CRISPR provides acquired resistance against    viruses in prokaryotes. Science. 2007 Mar. 23; 315(5819):1709-12.-   48. Ishino Y. et al. Nucleotide sequence of the iap gene,    responsible for alkaline phosphatase isozyme conversion in    Escherichia coli, and identification of the gene product. J    Bacteriol. 1987 December; 169(12):5429-33.-   49. Mojica F. J. M et al. Biological significance of a family of    regularly spaced repeats in the genomes of Archaea, Bacteria and    mitochondria. Molecular Microbiology (2000) 36(1), 244-246.-   50. Jansen R. et al. Identification of genes that are associated    with DNA repeats in prokaryotes. Molecular Microbiology (2002)    43(6), 1565-1575.-   51. Gouet, P., Courcelle, E., Stuart, D. I., and Metoz, F. (1999).    ESPript: analysis of multiple sequence alignments in PostScript.    Bioinformatics 15, 305-308.-   52. Notredame, C., Higgins, D. G., and Heringa, J. (2000). T-Coffee:    A novel method for fast and accurate multiple sequence alignment. J    Mol Biol 302, 205-217.-   53. Adams, P. D., Grosse-Kunstleve, R. W., Hung, L. W., Ioerger, T.    R., McCoy, A. J., Moriarty, N. W., Read, R. J., Sacchettini, J. C.,    Sauter, N. K., and Terwilliger, T. C. (2002). PHENIX: building new    software for automated crystallographic structure determination.    Acta Crystallogr D Biol Crystallogr 58, 1948-1954.-   54. Ariyoshi, M., Vassylyev, D. G., Iwasaki, H., Nakamura, H.,    Shinagawa, H., and Morikawa, K. (1994). Atomic structure of the RuvC    resolvase: a holliday junction-specific endonuclease from E. coli.    Cell 78, 1063-1072.-   55. Biertumpfel, C., Yang, W., and Suck, D. (2007). Crystal    structure of T4 endonuclease VII resolving a Holliday junction.    Nature 449, 616-620.-   56. Chen, L., Shi, K., Yin, Z., and Aihara, H. (2013). Structural    asymmetry in the Thermus thermophilus RuvC dimer suggests a basis    for sequential strand cleavages during Holliday junction resolution.    Nucleic acids research 41, 648-656.-   57. delaFortelle, E., and Bricogne, G. (1997). Maximum-likelihood    heavy-atom parameter refinement for multiple isomorphous replacement    and multiwavelength anomalous diffraction methods. Methods Enzymol    276, 472-494.-   58. Emsley, P., and Cowtan, K. (2004). Coot: model-building tools    for molecular graphics. Acta Crystallogr D Biol Crystallogr 60,    2126-2132.-   59. Fonfara, I., Le Rhun, A., Chylinski, K., Makarova, K. S.,    Lecrivain, A. L., Bzdrenga, J., Koonin, E. V., and Charpentier, E.    (2013). Phylogeny of Cas9 determines functional exchangeability of    dual-RNA and Cas9 among orthologous type II CRISPR-Cas systems.    Nucleic acids research.-   60. Fu, Y., Foden, J. A., Khayter, C., Maeder, M. L., Reyon, D.,    Joung, J. K., and Sander, J. D. (2013). High-frequency off-target    mutagenesis induced by CRISPR-Cas nucleases in human cells. Nature    biotechnology 31, 822-826.-   61. Gilbert, L. A., Larson, M. H., Morsut, L., Liu, Z., Brar, G. A.,    Torres, S. E., Stern-Ginossar, N., Brandman, O., Whitehead, E. H.,    Doudna, J. A., et al. (2013). CRISPR-mediated modular RNA-guided    regulation of transcription in eukaryotes. Cell 154, 442-451.-   62. Gorecka, K. M., Komorowska, W., and Nowotny, M. (2013). Crystal    structure of RuvC resolvase in complex with Holliday junction    substrate. Nucleic Acids Res 41, 9945-9955.-   63. Gratz, S. J., Cummings, A. M., Nguyen, J. N., Hamm, D. C.,    Donohue, L. K., Harrison, M. M., Wildonger, J., and    O'Connor-Giles, K. M. (2013). Genome engineering of Drosophila with    the CRISPR RNA-guided Cas9 nuclease. Genetics 194, 1029-1035.-   64. Holm, L., and Rosenstrom, P. (2010). Dali server: conservation    mapping in 3D. Nucleic acids research 38, W545-549.-   65. Hsu, P. D., Scott, D. A., Weinstein, J. A., Ran, F. A.,    Konermann, S., Agarwala, V., Li, Y., Fine, E. J., Wu, X., Shalem,    O., et al. (2013). DNA targeting specificity of RNA-guided Cas9    nucleases. Nature biotechnology 31, 827-832.-   66. Hwang, W. Y., Fu, Y., Reyon, D., Maeder, M. L., Tsai, S. Q.,    Sander, J. D., Peterson, R. T., Yeh, J. R., and Joung, J. K. (2013).    Efficient genome editing in zebrafish using a CRISPR-Cas system.    Nature biotechnology 31, 227-229.-   67. Kabsch, W. (2010). Xds. Acta crystallographica Section D,    Biological crystallography 66, 125-132.-   68. Konermann, S., Brigham, M. D., Trevino, A. E., Hsu, P. D.,    Heidenreich, M., Cong, L., Platt, R. J., Scott, D. A., Church, G.    M., and Zhang, F. (2013). Optical control of mammalian endogenous    transcription and epigenetic states. Nature 500, 472-476.-   69. Li, C. L., Hor, L. I., Chang, Z. F., Tsai, L. C., Yang, W. Z.,    and Yuan, H. S. (2003). DNA binding and cleavage by the periplasmic    nuclease Vvn: a novel structure with a known active site. The EMBO    journal 22, 4014-4025.-   70. Maeder, M. L., Linder, S. J., Cascio, V. M., Fu, Y., Ho, Q. H.,    and Joung, J. K. (2013). CRISPR RNA-guided activation of endogenous    human genes. Nature methods 10, 977-979.-   71. Mali, P., Aach, J., Stranges, P. B., Esvelt, K. M., Moosburner,    M., Kosuri, S., Yang, L., and Church, G. M. (2013a). CAS9    transcriptional activators for target specificity screening and    paired nickases for cooperative genome engineering. Nature    biotechnology 31, 833-838.-   72. Marraffini, L. A., and Sontheimer, E. J. (2008). CRISPR    interference limits horizontal gene transfer in staphylococci by    targeting DNA. Science 322, 1843-1845.-   73. Marraffini, L. A., and Sontheimer, E. J. (2010). CRISPR    interference: RNA-directed adaptive immunity in bacteria and    archaea. Nat Rev Genet 11, 181-190.-   74. Mojica, F. J., Diez-Villasenor, C., Garcia-Martinez, J., and    Almendros, C. (2009). Short motif sequences determine the targets of    the prokaryotic CRISPR defence system. Microbiology 155, 733-740.-   75. Pattanayak, V., Lin, S., Guilinger, J. P., Ma, E., Doudna, J.    A., and Liu, D. R. (2013). High-throughput profiling of off-target    DNA cleavage reveals RNA-programmed Cas9 nuclease specificity.    Nature biotechnology 31, 839-843.-   76. Perez-Pinera, P., Kocak, D. D., Vockley, C. M., Adler, A. F.,    Kabadi, A. M., Polstein, L. R., Thakore, P. I., Glass, K. A.,    Ousterout, D. G., Leong, K. W., et al. (2013). RNA-guided gene    activation by CRISPR-Cas9-based transcription factors. Nature    methods 10, 973-976.-   77. Qi, L. S., Larson, M. H., Gilbert, L. A., Doudna, J. A.,    Weissman, J. S., Arkin, A. P., and Lim, W. A. (2013). Repurposing    CRISPR as an RNA-guided platform for sequence-specific control of    gene expression. Cell 152, 1173-1183.-   78. Ran, F. A., Hsu, P. D., Lin, C. Y., Gootenberg, J. S.,    Konermann, S., Trevino, A. E., Scott, D. A., Inoue, A., Matoba, S.,    Zhang, Y., et al. (2013). Double nicking by RNA-guided CRISPR Cas9    for enhanced genome editing specificity. Cell 154, 1380-1389.-   79. Sampson, T. R., Saroj, S. D., Llewellyn, A. C., Tzeng, Y. L.,    and Weiss, D. S. (2013). A CRISPR/Cas system mediates bacterial    innate immune evasion and virulence. Nature 497, 254-257.-   80. Shalem, O., Sanjana, N. E., Hartenian, E., Shi, X., Scott, D.    A., Mikkelsen, T. S., Heckl, D., Ebert, B. L., Root, D. E.,    Doench, J. G., et al. (2014). Genome-scale CRISPR-Cas9 knockout    screening in human cells. Science 343, 84-87.-   81. Sheldrick, G. M. (2008). A short history of SHELX. Acta    crystallographica Section A, Foundations of crystallography 64,    112-122.-   82. Spilman, M., Cocozaki, A., Hale, C., Shao, Y., Ramia, N., Terns,    R., Terns, M., Li, H., and Stagg, S. (2013). Structure of an RNA    silencing complex of the CRISPR-Cas immune system. Molecular cell    52, 146-152.-   83. Wang, H., Yang, H., Shivalila, C. S., Dawlaty, M. M., Cheng, A.    W., Zhang, F., and Jaenisch, R. (2013). One-step generation of mice    carrying mutations in multiple genes by CRISPR/Cas-mediated genome    engineering. Cell 153, 910-918.-   84. Wang, T., Wei, J. J., Sabatini, D. M., and Lander, E. S. (2014).    Genetic screens in human cells using the CRISPR-Cas9 system. Science    343, 80-84.-   85. Wiedenheft, B., Lander, G. C., Zhou, K., Jore, M. M., Brouns, S.    J., van der Oost, J., Doudna, J. A., and Nogales, E. (2011).    Structures of the RNA-guided surveillance complex from a bacterial    immune system. Nature 477, 486-489.-   86. Winn, M. D., Ballard, C. C., Cowtan, K. D., Dodson, E. J.,    Emsley, P., Evans, P. R., Keegan, R. M., Krissinel, E. B.,    Leslie, A. G., McCoy, A., et al. (2011). Overview of the CCP4 suite    and current developments. Acta crystallographica Section D,    Biological crystallography 67, 235-242.-   87. Yang, H., Wang, H., Shivalila, C. S., Cheng, A. W., Shi, L., and    Jaenisch, R. (2013). One-step generation of mice carrying reporter    and conditional alleles by CRISPR/Cas-mediated genome engineering.    Cell 154, 1370-1379.

LENGTHY TABLES The patent application contains a lengthy table section.A copy of the table is available in electronic form from the USPTO website(http://seqdata.uspto.gov/?pageRequest=docDetail&DocID=US20200080067A1).An electronic copy of the table will also be available from the USPTOupon request and payment of the fee set forth in 37 CFR 1.19(b)(3).

1-11. (canceled)
 12. A method for reducing interaction of aStreptococcus pyogenes Cas9 (SpCas9) protein with a target DNA,comprising mutating one or more residues at Asn497, Trp659, Arg661,Gln695, Gln926, Glu1108, Leu169, Tyr450, Met495, Met694, or His698 of acorresponding wild-type SpCas9.
 13. A method for reducing interaction ofa Streptococcus pyogenes Cas9 (SpCas9) protein with backbone phosphategroups of a target DNA, comprising mutating one or more residues atAsn497, Trp659, Arg661, Gln695, Gln926, or Glu1108 of a correspondingwild-type SpCas9.
 14. A method for reducing interaction of aStreptococcus pyogenes Cas9 (SpCas9) protein with C2′ atoms of a targetDNA, comprising mutating one or more residues at Leu169, Tyr450, Met495,Met694, or His698 of a corresponding wild-type SpCas9.
 15. A compositioncomprising a Streptococcus pyogenes Cas9 (SpCas9) protein having reducedinteraction with a target DNA, wherein the SpCas9 protein comprises oneor more mutations at Asn497, Trp659, Arg661, Gln695, Gln926, Glu1108,Leu169, Tyr450, Met495, Met694, or His698 of a corresponding wild-typeSpCas9.
 16. The composition of claim 15, wherein the SpCas9 proteincomprises one or more mutations at Asn497, Trp659, Arg661, Gln695,Gln926, or Glu1108 which reduces interaction with backbone phosphategroups of the target DNA.
 17. The composition of claim 15, wherein theSpCas9 protein comprises one or more mutations at Leu169, Tyr450,Met495, Met694, or His698 which reduces interaction with C2′ atoms ofthe target DNA.
 18. The composition of claim 15, further comprising aCRISPR-Cas system guide RNA.
 19. The composition of claim 18, whereinthe SpCas9 protein and the guide RNA forms a CRISPR-Cas complex.